BEST AVAILABLE COPY 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 

International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 
C07K 1/14, 14/435 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 97A5588 

1 May 1997 (01.05.97) 



(21) International Application Number: PCT/US967 17325 

(22) International Filing Date: 25 October 1996 (25.10.96) 



(30) Priority Data: 

60/005.976 
60/006.802 



26 October 1995 (26.10.95) US 
15 November 1995 (15.1 1.95) US 



(71)(72) Applicants and Inventors: RUDENKO, Gabrielle 
[US/US1; Apartment 2145, 6445 Shady Brook Lane. Dallas. 
TX 75206 (US). DAZZO, Alessandra [IT/US]; 159 East 
Cherry Drive. Memphis. TN 38 113 (US). HOL. Wim. G.. 
J. [NUUS]; 18332 57th Avenue, N.E.. Seattle, WA 98155 
(US). 

(74) Agents: FOX. Samuel. L. et al.; Steme. Kessler, Goldstein & 
Fox P.L.L.C., Suite 600. 1100 New York Avenue, N.W., 
Washington. DC 20005-3934 (US). 



(81) Designated States: AU. CA. JP. US. European patent (AT. BE, 
CH, DE. DK. ES. Fl, FR. GB. GR. IE. IT. LU. MC, NL. 
PT, SE). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Title: PROTECTIVE PROTEIN/CATHEPS1N A AND PRECURSOR: CRYSTALLIZATION. X-RAY DIFFRACTION. THREE- 
DIMENSIONAL STRUCTURE DETERMINATION AND RATIONAL DRUG DESIGN 



(57) Abstract 

The present invention provides crystallized pro- 
tective protein/cathepsin A (PPCA), a precursor thereof 
(pPPCA) or at least one subdomain thereof; methods 
for x-ray diffraction analysis to provide x-ray diffrac- 
tion patterns of sufficiently high resolution for three- 
dimensional structure determination of the protein, as 
well as methods for rational drug design CRDD), based 
on using amino acid sequence data and/or x-ray crys- 
tallography data provided on computer readable media, 
as analyzed on a computer system having suitable com- 
puter algorithms. 
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Protective Protein/Cathepsin A and Precursor: Crystallization, X-Ray Diffraction, Three- 
Dimensional Structure Determination and Rational Drug Design 



Background of the Invention 

Statement as to Rights to Inventions Made Under 
Federally-Sponsored Research and Development 

Part of the work performed during development of this invention utilized U.S. Government funds. The U.S. 

Government has certain rights in this invention. 
Field of the Invention 

The present invention is in the fields of molecular biology, protein purification, protein crystallization, x-ray 
diffraction analysis, three-dimensional structure determination and rational drug design (RDD). The present invention 
provides crystallized protective protein/cathepsin A (PPCA) and its precursor (pPPCA). The crystallized PPCA or 
pPPCA is analyzed by x-ray diffraction techniques. The resulting x-ray diffraction patterns are of sufficiently high 
resolution to be useful for determining the three-dimensional structure of the PPCA or p PPCA protein, and for RDD. 
Reiated Background Art 

The human protective protein/cathepsin A (PPCA. also known as human protective protein or HPP) has been 
identified as the primary genetic defect underlying galactosia lidos is (d'Azzo et aL, Proc. Nail. Acad. Sci. U.S. A 79:4535- 
4539 (1982)). a lysosomal storage disease inherited as an autosomal recessive trait. Patients with this disorder are 
diaenosed as having drastically reduced P-galactosidase and neuraminidase activities in their ceil lysosomes. Examples 
of lysosomal storage diseases are presented in Table 316-1 of Braunwald et aL eds.. Harrison s Principles of Internal 
Medicine. 1 Ith Ed., pp. 1661-1671. McGraw Hill Book Co., New York (1987); as well as Wenger et aL. Biochem 
Biophys. Res Commun. 52:589-595 (1978); Tettamanti et aL eds., Sialidases and Sialidosts. Perspectives in Inherited 
Metabolic Diseases. Vol. 4. Edi. Ermes, Milano (1981), pp. 261-279 and 379-395; and van Diggelen et. aL Lancet 
2:804(1987), which references are entirely incorporated herein by reference.. 

Researchers have proposed that one of PPCA's functions is to stabilize p-galactosidase and neuraminidase in 
a multi-enzyme complex, which complex is deficient in galactosia! idosis patients (d'Azzo etaL (1982;. infra: Hoogeveen 
et af (1983;. infra). Evidence for this protective function comes from studies showing that PPCA is taken up from the 
culture medium by galactosialidosis fibroblasts and that PPCA restores both P-galactosidase and neuraminidase activities 
to these fibroblasts (d'Azzo et at. ( 19S2J, infra). 

The cDNA for PPCA directs the synthesis of a 452 amino acid precursor PPCA (pPPCA) (Figure 13) with a 
molecular weight of 54 kDa (Galjan e/ a/.. Cell. 54:755-764 (1988)). The amino acid sequences of PPCA (Figure 14) 
and pPPCA (Figure 13) contain two glycosylation sites (Asn 117 and Asn 305). both of which are glycosylated in 
cultured fibroblasts and cells over-expressing PPCA or pPPCA. pPPCA dimerizes soon after synthesis in the 
endoplasmic reticulum (ER) (Zhou et aL. EMBO J. 70:404-4048 (1991 )). 

Lysosomal PPCA has cathepsin A/deamidase/esterase activities which are exerted in vitro on a specific subset 
of bioactive peptides. Non-limiting examples of those hydrolyzed by PPCA are: substance P and substance P-free acid; 
oxvtocin and oxvtocin-free acid: neurokinin A; aneiotensin 1: bradykinin (Jackman infra. (1990). Furthermore, the 
enzvme inactivates endothelin I activity in rat smooth muscle cells and normal human tissues. This activity was deficient 
in liver from a galactosialidosis patient (Itoh. infra. 1995; Jackman et aL, J Biol. Chem. J67.2872-2875. (1992). 

Endothelins (ET-1 . ET-2 and ET-3) are potent vasoconstrictors and elevate blood pressure in mammals. They 
also influence cell proliferation and hormone production and have been implicated in cardiovascular disorders, ranging 
from hypertension to stroke to ischemic heart disease (Rubanyi and Polokoff. Pharmc Rev 45:325-4 1 5 (1994)). 

The three-dimensional structure of a PPCA or a pPPCA has not previously been published, which structure 
could delineate specific biological activities and ligands as therapeutics for PPCA-related pathologies. Accordingly, 
there is a need to provide three-dimensional structures of at least one PPCA, pPPCA or ligands for diagnosis or therap\ 
of PPCA-related pathologies. 
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Summary of the Invention 
The present invention provides methods of expressing, purifying and crystallizing a human protective 
protein/cathepsin A ( PPCA) and its precursor, precursor protective protein/cathepsin A (pPPCA). The present invention 
also provides methods for obtaining crystallized PPCA or pPPCA that can be analyzed to obtain x-ray diffraction 
5 patterns of sufficiently high resolution to be useful for three-dimensional structure determination of the protein. 

The x-ray diffraction patterns can be either analyzed directly to provide the three dimensional structure (if of 
sufficiently by high resolution), or atomic coordinates for the crystallized PPCA or pPPCA, as provided herein, can be 
used for structure determination. The x-ray panem/difTraction patterns obtained by methods of the present invention, 
and provided on computer readable media, are used to provide electron density maps. The amino acid sequence is also 
10 useful for three-dimensional structure determination. The data is then used in combination with phase determination 
(e.g.. using multiple isomorphous replacement (MIR) molecular replacement techniques) to generate electron density 
maps of a PPCA or a pPPCA, using a suitable computer system. 

The electron density maps, provided by analysis of either the x-ray diffraction patterns or working backwards 
from the atomic coordinates, provided herein, are then fined using suitable computer algorithms to generate secondary. 
15 tertiary and/or quaternary domains of a PPCA or a pPPCA. which domains are then used to provide an overall three- 
dimensional structure, as well as expected binding and active sites of the PPCA or pPPCA. pPPCA has some of the 
active and binding sites of PPCA . except for changes in structure due to the presence of the portion of the pPPCA which 
is deleted during maturation to PPCA (e.g.. residues 285-298 of Figure 13). 

Structure determination methods and computer systems are also provided by the present invention for rational 
20 drug design (RDD). These RDD methods use computer modeling programs to find potential ligands that are calculated 
to associate with, or bind to. sites or domains of a PPCA or a pPPCA. Potential ligands are then screened for modulating 
or binding activity. Such screening methods can be selected from assays for at least one PPCA-specific structural feature 
or biological activity, preferably as associated with a PPCA- or pPPCA-relatcd pathology, e.g.. protective activity (e.g.. 
modulation of P-galactosidase activity and neuraminidase (N A) activity); and peptide or enzyme modulating activity 
(e.g.. of endothelin I (serine carboxypeptidase), neuropeptides, cathepsin A, and the like), according to known assays. 
The resulting ligands provided by methods of the present invention are synthesized and are useful for treating, inhibiting 
or preventing at least one of PPCA related pathology in a mammal. 

Other objects of the invention will be apparent to one of ordinary skill in the art from the following detailed 
description and examples relating to the present invention. 

30 Brief Description of the Figures 

Figure I: is a schematic ribbon diagram of the PPCA monomer (monomer I), where Secondary structure 
assignments are according to DSSP (Kabsch and Sander. Biopolymers Z?:2577-2637 (1983)). The 'core* domain is 
shown in yellow. The 'cap' domain consists of a 'helical' subdomain, in red, and a 'maturation' subdomain, in orange. 
The catalytic triad Ser 150. His 429 and Asp 372 (from right to left) is shown by small green spheres. (Figure generated 
35 using MOLSCR!PT(Kraulis.y. Appl. Cryst. 24:946-950 (1991))). 

Figure 2 is stereo diagram is presented of the CV trace of the PPCA monomer 1 with numbering of selected 
residues. The residues forming the o-helices and p-strands are as follows accordine to DSSP: 

Core domain: Cpi (21-27): Cp2(32-39): Cp3<50-54) : Calf63-67) Cp4(73-75): CP5(82-84); Cp6f94-98): 
Ca2(l 18-135): CP7044-I49): Co3(152-163): Cp8(17l-I77): Ca4(307-313): Ca5(316-32l); Ca6(336-341 ): Ca7(350- 
40 359): Cp9(363-369): Ca8(377-386): CpI0(391-4OI ); Cpl 1(407-416): Cpl 2(4 19-424): Ca9(43 1-434): Ca 1 0(436-447): 

Capdomam: Hald 83-196): Ha2(202-2I2): Ha3(226-240): MPU26I-264): Mp2(267-270): Ma I (290-293); 
Mp3(296-299). Note that for monomer 2 the secondary structure assignments in the cap domain are slightly different 
than in monomer I . Residues in Hpl are in a region of poor density and Ma 1 is an extended coil. (Figure generated 
using MOLSCRIPT(Kraulis(!99l). infra). 
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Figure 3 shows the density for the disulfide bridges Cys 212-Cys 228 and Cys 213-Cys 218 is presented as 
revealed in the SigmaA weighted 2mF 0 -DF c electron density map (Read. Acta Crystaiiogr. A 42: 140-149 (1986)) 
calculated from the model refined to 2.2 A; the map has been contoured at lo. (Figure drawn with the O computer 
program (Jones. Acta Crystaiiogr A47: 11 0- 1 1 9 ( 1 991 ))). 
5 Figure 4 is stereo diagram is presented of the superimposed C* traces from the two crystallographically 

independent PPCA monomers forming the dimer. Monomer 1 is in blue, monomer 2 is in red. Residues referred to in 
the text are labeled. Residues 259 and 260 have not been incorporated in the model of monomer 2. since no electron 
density was observed for them. Note the tremendous difference in conformation of the excision peptide located in the 
upper right corner of the proteins. (Figure generated by MOLSCRIPT (Kraulis 0991), infra)). 

10 Figure 5 is a schematic ribbon diagram is presented of the PPCA dimer viewed approximately along the two- 

fold axis. For monomer I. the core domain is yellow while the cap domain consists of a helical subdomain in red and 
a maturation subdomain in orange. For monomer 2, the core domain is green, while the cap domain consists of a blue 
helical subdomain and a light blue maturation subdomain. (Figure generated using MOLSCRIPT (Kraulis (1991 ). infra)). 

Figure 6A-B is a representation of the molecular surface of the PPCA dimer. The surface was calculated with 

15 GRASP (Nicholls. A., et ai. Proteins 77:281-296 (1991 )) and colored according to the electrostatic potential. Dark blue 
corresponds to positive potential > + 1 5.0 kT/e and dark red to a negative <- 1 5.0 kT/e potential. Figure 6A: standard 
view, along the diad with the dimer oriented as in Figure 4. Figure 6B: side view of the dimer, ninety degrees rotated 
with respect to 6 A. 

Figure 7A-F presents a topological comparison of 6 members of the hydrolase fold family. The arrangement 

20 of structural elements in the central core domain (in green and yellow) of the different proteins is generally similar. The 
cap domains (in red) vary greatly. The following structures are shown starting from the top left hand corner (references 
and PDB entry codes are given in between brackets): Figure 7A shows the PPCA precursor cap domain that consists of 
two subdomains one a-helical and the other mainly {i-sheet; Figure 7B shows CPW (3SC2. Liao et ai (1992) infra), cap 
domain helical; Figure 7C shows CPY (LYSC, Endrizzi et ai (1994), infra), cap domain helical; Figure 7D shows 

25 dehatogenase (2HAD, Franken et ai. J. EMBO 70:1297-1302 (1991)). cap domain helical but quite different from the 
serine carboxypeptidases: Figure 7E shows lipase from Pseudomonas glumae (IT AH, Noble et ai. FEBS Lett. 33 7:123- 
128 (1993)), cap domain mixed a-helical and p-strands; and Figure 7F shows acetylcholine esterase (1 ACE. Sussman 
et ai. Science 253: 872-879 (1991)). cap domain large and predominantly a-helical. The secondary structure 
assignments were generated with the computer program O. using structures provided and/or available from the 

30 Brookhaven Protein Data Bank. (This Figure was generated using MOLSCRIPT (Kraulis (1991). infra)). 

Figure 8A-B shows the superposition of the C* traces from the PPCA and CPW monomers, showing that the 
major differences between the two enzymes are localized in the cap domain. PPCA has a large 'maturation' subdomain 
and the 'helical subdomain' is rotated with respect to the CPW counterpart (Figure drawn with the O program (Jones 
( 199 1 ), infra)). Figure 8B shows the C" traces from the PPCA and CPW dimers after the core domains from the subunits 

35 (shown on the right hand side of the two dimers) have been superimposed. Notice the remarkable difference in mutual 
orientation (of 1 5°) of the two subunits on the left hand side of the two dimers. which has been accentuated by an arrow. 
(Figure drawn with the O computer program (Jones (1991). supra)). 

Figure 9 is a stereo view of the Ca trace of PPCA monomer 1 highlighting regions involved in the maturation 
event. Color scheme for the trace is as follows: core domain in light blue, helical subdomain in red. maturation 

40 subdomain in orange with the exception of the excision peptide (residues 285-298) which is shown in blue. Orange 
sphere mark the residues 272 and 277 marking the beginning and end of the blocking peptide. The catalytic triad Ser 
150. His 429 and Asp 372 is shown as light blue spheres. Two cysteines Cys 253 and Cys 303 referred to in the 
discussion are colored green. (This Figure generated using MOLSCRIPT (Kraulis (1991 ). infra)). 

Figure 10 is a c Jose -up representation of the 'blocking' peptide (residues 272-277) bound in the active site. 

45 rendering the catalvtic triad solvent inaccessible. Residues from the maturation subdomain are shown in oranse. residues 



10 



WO 97/15588 PCT/US96/17325 

-4- 

from ihe helical domain in magenta and residues from the core domain in cyan. The excision peptide is shown in blue. 
Side chains are shown for residues making extensive contacts with the blocking peptide or if mentioned in the text. The 
catalytic triad is shown in white. (Figure drawn with O (Jones (1991), infra)). 

Figure II is a representation of elements proposed to be involved in the activation mechanism of the precursor 
form of PPCA as discussed in the text. The CMrace of the core domain is shown in cyan, the helical subdomain in red, 
the maturation subdomain in orange, and the excision peptide is shown in blue. Relevant side chains are depicted and 
labeled. Rearrangement of the residues 254-302 limited by the disulfide Cys 253 and Cys 303 would free up the active 
site cleft. A charge cluster Arg 262, Glu 264. Arg 298 and Asp 300 occupies a strategic position within the maturation 
subdomain, possibly involved in pH dependent regulation of conformational changes. The solvent accessible surface 
was calculated and visualized with the atomic coordinates by BIOGRAF (BIOGRAF Construct Users Guide Version 
3.2.1. , June 1993). 

Figure 12 is a schematic representation of the proposed activation of PPCA. The active site cleft is formed by 
the core domain (indicated as 'core* in the above scheme) and the helical subdomain (indicated as i o*). The maturation 
subdomain (indicated as *m') contains the residues that block the active site cleft rendering the precursor enzymatically 
15 inactive, shown in structure 1. In the acidic endosome/lysosome. the precursor undergoes activation, in activation 
pathway 2a, conformational rearrangements induced by low pH might render the excision peptide more accessible to 
proteases as a first step, followed by cleavage of the polypeptide chain removing the excision peptide. Alternatively, 
in pathway 2b t proteolytic cleavage of the excision peptide might form the trigger for the total rearrangement, removing 
the blocking peptide from the active site and thus generating the fully active enzyme as shown in structure 3. 
20 Figure 13 shows the amino acid sequence of a human pPPCA. The underlined portion (residues 285-298) 

shows an excision peptide for conversion to the mature form, PPCA. 

Figure 14 shows the amino acid sequence of a human PPCA. 

Figure 15 shows a sequence alignment between pPPCA, CPW and CPY (top three sequences shown). Identical 
residues among all three sequences are boxed. Residue numbering is included for the pPPCA amino acid sequence. 
25 The alignment was made using the GCG program PILEUP (GCG version 8), then manually adjusted using 3D-structural 
knowledge from the superposition of the CPW (Liao et at, 1992) and CPY (Endrizzi et a/., 1994) atomic coordinates. 
The alignment was later used to design a multi-Ala search probe for molecular replacement calculations shown in the 
fourth sequence shown as 'model*. The structure determination of pPPCA subsequently revealed that the protein can 
be divided in two domains: a 'core' domain (residues 1-182 and 303-452) and 'cap* domain (residues 183-302). The 
30 secondary structure elements for the PPCA precursor are depicted with shaded bars (for details on the assignment and 
nomenclature, see Rudenko et cd. Structure 3:1249-1259 (1988) ). 

Figure 16 shows a schematic representation of a 'bootstrapping* cycle as described in Example 2. 
Figure 1 7 is a representation of an initial molecular mask enlarged to accommodate missing area s in the model. 
The program MAMA (Kleywegt & Jones, 1994) was used to calculate the mask and mask editing options in O (Jones 
35 et a/., 1991) were used to extend the mask. 

Figure 18 is a representation of an enlargement of the model during the bootstrapping procedure plotted as a 
function of the expansion step. The number of C* atoms incorporated in the model per monomer is given (— o— ) as 
well as the number of correct side chains (-*--). Note that after the first round of building in the molecular replacement 
map (expansion step mr*). 37 residues from the molecular replacement search probes had to be deleted from the model 
40 reducing the number of C* atoms to 294. Subsequent cycles allowed for the model to be expanded by small increments. 

Figure 19 is a representation of a comparison of the C trace from a monomer core model (shown in magenta) 
and the complete PPCA monomer (shown in yellow). The core model contained only 294 C* atoms. The 452 residue 
PPCA monomer consists of a core domain and a cap domain. The helical subdomain and the maturation subdomain 
forming the cap domain have been shown in the figure above. 
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Figure 20A-D is a representation of the resolving power of the bootstrapping procedure showing three different 
stages in map quality. The atomic coordinates of the refined model are visualized with the electron density in Figures 
20B. 20C and 20D. Figures 20A and 20B show the initial 2m|F cbt |-D|F Bi J SigmaA weighted map calculated using 
phases from the molecular replacement solution. The electron density is essentially uninterpretable. Fig. 20C shows 
5 twofold averaged 2|F obs |-| F.J electron density map calculated using inverted phases from cycle bmc6. The density for 
p-strand M02 (residues 266-271) has become clearly visible. Fig. 20D shows unaveraged 2m|F 0 J-D|F tEk | SigmaA 
weighted map calculated using phases from the refined model The quality of the density is very good. Density for the 
helix Mai (residues 287-293) which assumes a different conformation in the two monomers is now also apparent. 

Figure 21 shows a Ramachandran plot calculated for one monomer from a refined model of a pPPCA. Both 
10 monomers in the asymmetric unit give essentially equivalent plots. 

Figure 22 shows a schematic of a computer system for PPCA or pPPCA structure determination and/ or rational 

drug design. 

Figure 23.1-52 lists the atomic coordinates for the active site of a pPPCA dimer having the amino acid 

sequence presented as portions of at least one of 50-76. 144-155, 173-197, 226-253. 226-288, 294-310. 327-344, 338- 

15 350, 366-381 and 423-436 of (Figure 23.1-23.26) 452 amino acids (designated 1-452) of monomer 1, as well as 

corresponding portions of (Figure 23.26-23.52) 452 amino acids (designated 1001-1452) of monomer 2. 

Detailed Description of the Preferred Embodiments 
The present invention provides methods for expressing, purifying and crystallizing a protective 

protein/cathepsin A (PPCA) or a precursor protective protein/cathepsin A (pPPCA), where the crystals diffract x-rays 
20 with sufficiently high resolution to allow determination of the three-dimensional structure of the PPCA or pPPCA, or 
a portion or subdomatn thereof. The three-dimensional structure (e.g.. as provided on computer readable media of the 
present invention) is useful for rational drug design of ligands of a PPCA or a pPPCA. Such ligands can be synthesized 
or recombinantly produced and are useful as diagnostic agents or drugs for diagnosing, treating, inhibiting or preventing 
at least one PPCA- or pPPCA-related pathology. 
25 The determined structure is made using the PPCA or pPPCA amino acid sequences and/or atomic coordinate/x- 

ray diffraction data, which are analyzed to provide atomic model output data corresponding to the three-dimensional 
structure, e.g.. as provided on computer readable media. The computer analysis of the atomic coordinate/x-ray 
diffraction data and/or the amino acid sequence allows the calculation of the secondary, tertiary and/or quaternary 
structures: domains: and/or subdomains of the protein. These domains are combined and refined by additional 
30 calculations using suitable computer subroutines to determine the most probable or actual three-dimensional structure 
of the PPCA or pPPCA, including potential or actual active sites, binding sites or other structural or functional domains 
or subdomains of the protein. 

* 

Structure determination methods are also provided by the present invention for rational drug design (RDD) of 
PPCA or pPPCA ligands. Such drug design uses computer modeling programs that calculate different molecules 

35 expected to interact with the determined active sites, binding sites, or other structural or functional domains or 
subdomains of a PPCA or a pPPCA. These ligands can then be produced and screened for activity in modutating or 
binding to a PPCA or pPPCA, according to methods and compositions of the present invention. 

The actual PPCA or pPPCA-ligand complexes can optionally be crystallized and analyzed using x-ray 
diffraction techniques. The diffraction patterns obtained are similarly used to calculate the three-dimensional interaction 

40 of the lieand and the PPCA or pPPCA, to confirm that the hgand binds to. or changes the conformation of. particular 
domain(s) or subdomain(s) of the PPCA or pPPCA. Such screening methods are selected from assays for at least one 
biolozical activity of a PPCA or a pPPCA. The resulting ligands. provided by methods of the present invention, 
modulate or bind at least one PPCA or pPPCA and are useful for diagnosing, treating or preventing PPCA- or pPPCA- 
related pathologies in animals, such as humans. Ligands of a particular PPCA or pPPCA can similarly modulate other 

45 PPCAs or pPPCAs from other sources, such as other eukaryotes. 
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A PPCA or pPPCA is also provided as a crysiallized proie.n suitable for x-ray diffraction analysis. The x-ray 
diffraction patterns obtained by the x-ray analysis are of moderate, to moderately high, to high resolution, e.g.. 30-10, 
10-3.5 or 1 .5-3.5 A. respectively, with the higher resolutions included These diffraction patterns are suitable and useful 
for three-dimensional structure determination of a PPCA or a pPPCA. domain or subdomain thereof. 
5 The determination of the three-dimensional structure of a PPCA or pPPCA has a broad-based utility . 

Significant sequence identity and conservation of important structural elements are expected to exist among different 
PPCAs or pPPCAs. Therefore, the three-dimensional structure from one or few PPCAs or pPPCAs can be used to 
identify ligands that have diagnostic or therapeutic value for at least one PPCA- or pPPCA-related pathology that may 
involve PPCAs or pPPCAs having different amino acid sequences.: 
1 0 Determination of Protein Structures 

Different techniques give different and complementary information about protein structure. The primary 
structure is obtained by biochemical methods, either by direct determination of the amino acid sequence from the 
protein, or from the nucleotide sequence of the corresponding gene or cDN A. The quaternary structure of large proteins 
or aggregates can also be determined by electron microscopy. To obtain the secondary and tertiary structure, which 
1 5 requires detailed information about the arrangement of atoms within a protein, x-ray crystallography is preferred. See, 
e.g.. Blundell, infra: Oxender. infra: McPherson, infra: Wyckoff. infra. 

The first prerequisite for solving the three-dimensional structure of a protein by x-ray crystallography is a well- 
ordered crystal that will diffract x-rays strongly. The crystal lographic method directs a beam of x-rays onto a regular, 
repeating array of many identical molecules so that the x-rays are diffracted from it in a pattern from which the structure 
of an individual molecule can be rerrieved. Well-ordered crystals of globular protein molecules are large, spherical, or 
ellipsoidal objects with irregular surfaces, and crystals thereof contain large holes or channels that are formed between 
the individual molecules. These channels, which usually occupy more than half the volume of the crystal, are filled with 
disordered solvent molecules. The protein molecules are in contact with each other at only a few small regions. This 
is one reason why structures of proteins determined by x-ray crystallography are generally the same as those for the 
25 proteins in solution. 

The formation of crystals is dependent on a number of different parameters, including pH, temperature, protein 
concentration, the nature of the solvent and precipitant, as well as the presence of added ions or ligands to the protein. 
Many routine crystallization experiments may be needed to screen all these parameters for the few combinations that 
might give crystal suitable for x-ray diffraction analysis. Crystallization robots can automate and speed up the work of 

30 reproducibly setting up large numbers of crystallization experiments. 

A pure and homogeneous protein sample is important for successful crystallization. Proteins obtained from 
cloned genes in efficient expression vectors can be purified quickly to homogeneity in large quantities in a few 
purification steps. A protein to be crystallized is preferably at least 93-99% pure according to standard criteria of 
homogeneity. Crystals form when molecules are precipitated very slowly from supersaturated solutions. The most 

35 frequently used procedure for making protein crystals is the hanging-drop method, in which a drop of protein solution 
is brought very gradually to supersaturation by loss of water from the droplet to the larger reservoir that contains salt 
or polyethylene glycol solution. 

Different crystal forms can be more or less well-ordered and hence give diffraction patterns of different qualit\ 
As a general rule, the more closely the protein molecules pack, and consequently the less water the crystals contain, the 
40 better is the diffraction pattern because the molecules are berter ordered in the crystal. 

X-rays are electromagnetic radiation at short wavelengths, emitted when electrons jump from a hicher to a 
lower energy state. In conventional sources in the laboratory , x-rays are produced by high-voltage tubes in which a 
metal plate, the anode, is bombarded with accelerating electrons and thereby caused to emit x-rays of a specific 
wavelength, so-called monochromatic x-rays. The high voltage rapidly heats up the metal plate, which therefore has 
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to be cooled. Efficient cooling is achieved by so-called rotating anode x-ray generators, where the metal plate revolves 
during the experiment so that different parts are heated up. 

More powerful x-ray beams can be produced in synchrotron storage rings where electrons (or positrons) travel 
close to the speed of light. These panicles emit very strong radiation at all wavelengths from short gamma rays to visible 
5 light. When used as an x-ray source, only radiation within a window of suitable wavelengths is channeled from the 
storage ring. Polychromatic x-ray beams are produced by having a broad window that allows through x-ray radiation 
with wavelengths of 0.2 - 3.5 A. 

In diffraction experiments a narrow and parallel beam of x-rays is taken out from the x-ray source and directed 
onto the crystal to produce diffracted beams. The incident primary beam causes damage to both protein and solvent 
10 molecules. The crystal is, therefore, usually cooled to prolong its lifetime (e.g., -220 to -50°C). The primary beam must 
strike the crystal from many different directions to produce all possible diffraction spots, and so the crystal is rotated 
in the beam during the experiment. 

The diffracted spots are recorded either on a film, the classical method, or by an electronic detector. The 
exposed film has to be measured and digitized by a scanning device, whereas electronic detectors feed the signals they 
15 detect directly in a digitized form into a computer. Electronic area detectors <an electronic film) significantly reduce 
the time required to collect and measure diffraction data. 

When the primary beam from an x-ray source strikes the crystal, some of the x-rays interact with the electrons 
on each atom and cause them to oscillate. The oscillating electrons serve as a new source of x-rays, which are emitted 
in almost all directions, referred to as scattering. When atoms (and hence their electrons) are arranged in a regular three- 
20 dimensional array, as in a crystal, the x-rays emitted from the oscillating electrons interfere with one another. In most 
cases, these x-rays, colliding from different directions, cancel each other out; those from cenain directions, however, 
will add together to produce diffracted beams of radiation that can be recorded as a pattern on a photographic plate or 
detector. 

The diffraction pattern obtained in an x-ray experiment is related to the crystal that caused the diffraction. X- 
25 rays that are reflected from adjacent planes travel different distances, and diffraction only occurs when the difference 
in distance is equal to the wavelength of the x-ray beam. This distance is dependent on the reflection angle, which is 
equal to the angle between the primary beam and the planes. 

The relationship between the reflection angle (6), the distance between the planes (d). and the wavelength (A.) 
is given by Bragg's law: 2d sin 6 = k. This relation can be used to determine the size of the unit cell in the crystal. 
30 Briefly, the position on the film of the diffraction data relates each spot to a specific set of planes through the crystal. 
By using Bragg's law, these positions can be used to determine the size of the unit cell. 

Each atom in a crystal scatters x-rays in all directions, and only those that positively interfere with one another, 
according to Bragg's law, give rise to diffracted beams that can be recorded as a distinct diffraction spot above 
background. Each diffraction spot is the result of interference of all x-rays with the same diffraction angle emerging 
35 from all atoms. For example, for the protein crystal of myoglobin, each of the about 20.000 diffracted beams that have 
been measured contain scattered x-rays from each of the around 1 500 atoms in the molecule. To extract information 
about individual atoms from such a system requires considerable computation. The mathematical tool that is used to 
handle such problems is called the Fourier transform. 

Each diffracted beam, which is recorded as a spot on the film, is defined by three properties: the amplitude. 
40 which we can measure from the intensity of the spot: the wavelength, which is set by the x-ray source: and the phase, 
which is lost in x-ray experiments. All three properties are needed for alt of the diffracted beams, in order to determine 
the position of the atoms giving rise to the diffracted beams. 

For larger molecules, protein crystallographers have determined the phases in many cases using a method called 
multiple tsomorphous replacement (MIR) (including heavy metal scattering), which requires the introduction of neu 
45 x-ray scanerers into the unit cell of the crystal. These additions are usually heavy atoms (so that they make a significant 
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contribution to the diffraction pattern), such that there should not be too many of them (so that their positions can be 
located); and they should not change the structure of the molecule or of the crystal cell. i.e.. the crystals should be 
isomorphous. lsomorphous replacement is usually done by diffusing different heavy-metal complexes into the channels 
of the preformed protein crystals. The protein molecules expose side chains (such as SH groups) into these solvent 
channels that are able to bind heavy metals. It is also possible to replace endogenous light metals in metal loproteins with 
heavier ones, e.g., zinc by mercury, or calcium by samarium. 

Since such heavy metals contain many more electrons than the light atoms (H, N. C, O, and S) of the protein, 
they scatter x-rays more strongly. All diffracted beams would therefore increase in intensity after heavy-metal 
substitution if all interference were positive. In fact, however, some interference is negative; consequently, following 
heavy-metal substitution, some spots measurably increase in intensity, others decrease, and many show no detectable 
difference. 

Phase differences between diffracted spots can be determined from intensity changes following heavy-metal 
substitution. First the intensity differences are used to deduce the positions of the heavy atoms in the crystal unit cell. 
Fourier summations of these intensity differences give maps of the vectors between the heavy atoms, the so-called 
Patterson maps. From these vector maps the atomic arrangement of the heavy atoms is deduced. From the positions 
of the heavy metals in the unit cell, one can calculate the amplitudes and phases of their contribution to the diffracted 
beams of protein crystals containing heavy metals. 

This knowledge is then used to find the phase of the contribution from the protein in the absence of the heavy- 
metal atoms. As both the phase and amplitude of the heavy metals and the amplitude of the protein alone is known, as 
well as the amplitude of the protein plus heavy metals (i.e., protein heavy-metal complex), one phase and three 
amplitudes are known. From this, the interference of the x-rays scattered by the heavy metals and protein can be 
calculated to see if it is constructive or destructive. The extent of positive or negative interference, with knowledge of 
the phase of the heavy metal, give an estimate of the phase of the protein. Because two different phase angles are 
determined and are equally good solutions, a second heavy-metal complex can be used which also gives two possible 
phase angles. Only one of these will have the same value as one of the two previous phase angles; it therefore represents 
the correct phase angle. In practice, more than two different heavy-metal complexes are usually made in order to give 
a reasonably good phase determination for all reflections. Each individual phase estimate contains experimental errors 
arising from errors in the measured amplitudes. Furthermore, for many reflections, the intensity differences are too small 
to measure after one particular isomorphous replacement, and others can be tried. 

The amplitudes and the phases of the diffraction data from the protein crystals are used to calculate an electron- 
density map of the repeating unit of the crystal. This map then has to be interpreted as a polypeptide chain with a 
particular amino acid sequence. The interpretation of the electron-density map is made more complex by several 
limitations of the data. First of all. the map itself contains errors, mainly due to errors in the phase angles. In addition, 
the quality of the map depends on the resolution of the diffraction data, which in turn depends on how well-ordered the 
crystals are. This directly influences the image that can be produced. The resolution is measured in A units: the smaller 
this number is. the higher the resolution and therefore the greater the amount of detail that can be seen. 

Building the initial model is a trial-and-error process. First, one has to decide how the polypeptide chain 
weaves its way through the electron-density map. The resulting chain trace constitutes a hypothesis, by which one tries 
to match the density of the side chains to the known sequence of the polypeptide. When a reasonable chain trace has 
finally been obtained, an initial model is built to give the best fit of the atoms to the electron density. Computer graphics 
are used both for chain tracing and for model building to present the data and manipulated the models. 

The initial model will contain some errors. Provided the protein crystals diffract to high enough resolution (e.g.. 
better than 3.5 A), most or substantially all of the en-ors can be removed by crystal lographic refinement of the model 
using computer algorithms. In this process, the model is changed to minimize the difference between the experimentally 
observed diffraction amplitudes and those calculated f <t hypothetical crystal containing the model (instead of the real 
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molecule). This difference is expressed as an R factor (residual disagreement) which is 0.0 for exact agreement and 
about 0.59 for total disagreement. 

In general, the R factor is preferably between 0.15 and 0.35 (such as less than about 0.24-0.28) for a well- 
determined protein structure. The residual difference is a consequence of errors and imperfections in the data. These 
derive from various sources, including slight variations in the conformation of the protein molecules, as well as 
inaccurate corrections both for the presence of solvent and for differences in the orientation of the microcrystals from 
which the crystal is built. This means that the final model represents an average of molecules that are slightly different 
both in conformation and orientation. 

In refined structures at high resolution, there are usually no major errors in the orientation of individual 
residues, and the estimated errors in atomic positions are usually around 0. 1-0.2 A, provided the amino acid sequence 
is known. Hydrogen bonds, both within the protein and to bound ligands, can be identified with a high degree of 
confidence. 

Most x-ray structures are determined to a resolution between 1 .7 A and 3.5 A. Electron-density maps with this 
resolution range are preferably interpreted by fining the known amino acid sequences into regions of electron density 
15 in which individual atoms are not resolved. 

An amino acid sequence is preferred for accurate x-ray structure determination. Thus, recombinant DNA 
techniques have had a double impact on x-ray structural work. When a protein is cloned and overexpressed for structural 
studies, the amino acid sequence, necessary for the x-ray work, is also quickly obtained via the nucleotide sequence. 
Recombinant DNA techniques give us not only abundant supplies of rare proteins, but also their amino acid sequence 
20 as a bonus. See. e g.. Blundell. infra: Oxender. infra: McPherson. infra: WyckofT, infra. 
Isolated PPCA and pPPCA Polypeptides 

A PPCA or pPPCA polypeptide can refer to any subset of a PPCA or pPPCA as a domain, subdomain, 
fragment, consensus sequence or repeating unit thereof. A PPCA or pPPCA polypeptide of the present invention can 
be prepared by, e.g.,: 
25 (a) recombinant DNA methods; 

(b) proteolytic digestion of the intact molecule or a domain, subdomain or fragment thereof; 

(c) chemical peptide synthesis methods well-known in the art; and/or 

(d) by any other method capable of producing a PPCA or pPPCA polypeptide and having a conformation 
similar to a structural or functional subdomain of a PPCA or a pPPCA. 

30 A biological activity of PPCA or pPPCA can be screened according to known screening assays. The minimum 

peptide sequence to have activity is based on the smallest unit containing or comprising a particular domain, subdomain, 
fragment, region, consensus sequence, or repeating unit thereof, having at least one biological activity of a PPCA or 
pPPCA. such as protecting activity, inhibiting activity or enzyme activity. Non-limiting examples of such activities are: 
protecting activity for (*-galactosidase or neuraminidase (NA); modulating activity (inhibition, stimulation or activation) 

35 as an for endothelin 1 (serine carboxypeptidase) or cathepsin A and peptide hydrolyzing activity (e.g.. substance P and 
substance P-free acid; oxytocin and oxytocin-free acid; neurokinin A; angiotensin I; and bradykinin. 

According to the present invention, a PPCA or pPPCA includes an association of two or more polypeptide 
subdomains. such as at least one 4 amino acid portion of a core or cap domain of a PPCA or pPPCA, This can include 
1-14 subdomains of the cap domain and/or 1-44 subdomains of the core domain (as monomers or dimers). or any ranee. 

40 value or combination thereof. Preferably 1-4 sets of each of at least one core or cap domains or subdomains are 
included. 

The structure of a monomer or domain of at least one PPCA includes at least one subdomain of a PPCA of a 
pPPCA of the present invention can include one or more of the following subdomains, as described herein. Generally 
a PPCA or pPPCA consists of a dimer of a core domain and a cap domain having the following subdomains having the 
45 specified residues, e.g.. as presented in Figure 13 (pPPCA) or Figure 14 (PPCA):: 



WO 97/15588 



PCT/US96/17325 



Core domain subdomains: Cpl. 21-27; Cp2, 32-39; Cp3. 50-54; Col. 63-67: CP*4. 73-75; CP5. 82- 
84; Cp6. 94-98; Ca2. 1 18-135; Cp7 t 144-149; Ca3, 152-163; Cp8, 171-177; Co4. 307-313; Ca5. 316-321. 
Ca6. 336-34 1 ; Ca7. 350-359; Cp9. 363-369; Ca8. 377-386; Cpl 0. 39 1 -40 1 ; Cp 11 . 407-4 1 6; Cp 1 2. 4 1 9-424 ; 
Ca9. 431-434; CalO, 436-447; and 
5 Cap domain subdomains: Hal, 183-196; Ha2. 202-212; Ha 3, 226-240; Mpi. 261-264; MP2. 267- 

270; Mai. 290-293; MP3, 296-299. Note that for monomer 2 the secondary structure assignments in the cap 
domain are slightly different than in monomer 1. 

A PPCA or pPPCA polypeptide of the invention can have at least 80% homology, such as 80-100% overall 
homology or identity, with one or more corresponding PPCA or pPPCA subdomains or fragments as described herein, 

10 such as a 4-542 amino acid fragment or portion of the amino acid sequence of Figures 13, 14 or 15. As would be 
understood by one of ordinary skill in the art, the above configurations of subdomains are provided as pan of a PPCA 
or pPPCA polypeptide of the invention, when expressed in a suitable host cell, or otherwise synthesized, to provide at 
least one structural or functional feature of a native PPCA or pPPCA. such as at least one PPCA-related biological 
activity. Such activities can be assayed using a suitable assay, to establish at least one PPCA biological activity of one 

15 or more PPCAs or pPPCAs of the invention. A PPCA or pPPCA polypeptide of the invention is not naturally occurring 
or is naturally occurring but is in a purified or isolated form which does not occur in nature. Examples of suitable PPCA 
activity assay include, e.g., cathepsin A activity (Galjan et aL, J. Biol. Chem. 266:14754-14762 (1991); Endothelin I 
deamidase activity (Jackman. et aL, J. Biol Chem. 267:2872-2875(1992); and tachykinin deamidase activity (Jackman,. 
et aL. J. Biol. Chem. 265 A 1265-1 1272 (1990)). 

20 Percent homology or identity can be determined, for example, by comparing sequence information using the 

GAP computer program, version 6.0, available from the University of Wisconsin Genetics Computer Group (UWGCG). 
The GAP program utilizes the alignment method of Needleman and Wunsch (J. Mol. Biol. 48:443 (1970), as revised 
by Smith and Waterman {Adv. Appl. Math 2:482 (1981). Briefly, the GAP program defines similarity as the number 
of aligned symbols (i.e.. nucleotides or amino acids) which are similar, divided by the total number of symbols in the 

25 shorter of the two sequences. The preferred default parameters for the GAP program include: (1 ) a unitary comparison 
matrix (containing a value of 1 for identities and 0 for non-identities) and the weighted comparison matrix of Gribskov 
and Burgess, Nucl. Acids Res. 14:6745 (1986), as described by Schwartz and Dayhoff, eds., ATLAS OF PROTEIN 
SEQUENCE AND STRUCTURE, National Biomedical Research Foundation, pp. 353-358 (1979); (2) a penalty of 3.0 
for each gap and an additional 0.10 penalty for each symbol in each gap; and (3) no penalty for end gaps. 

30 Thus, one of ordinary skill in the art, given the teachings and guidance presented in the present specification, 

will know how to add. delete or substitute other amino acid residues in other positions of a PPCA or pPPCA to obtain 
substituted, deletiona) or additional variants thereof. 

Non-limiting examples of substitutions of a PPCA or pPPCA domains or polypeptide of the invention are those 
in which at least one amino acid residue in the protein molecule has been removed and a different residue added in its 

35 place according to the following Table 2. The types of substitutions which can be made in the protein or peptide 
molecule of the invention can be based on analysis of the frequencies of amino acid changes between a homologous 
protein of different species, such those presented in Figure 1 5. Based on such an analysis, alternative substitutions are 
defined herein as exchanges within one of the following five groups: 



1 Small aliphatic, nonpolar or slightly polar residues: Ala. Ser. Thr (Pro. Gly): 

40 2 Polar, negatively charged residues and ihcir amides Asp. Asn. Glu. Gin: 

3 Polar, positively charged residues 
His. Arg. Lys; 

4 Large aliphatic, nonpolar residues: 
Met. Leu. lie. Val (Cys); and 

45 5 Large aromatic residues: Phe. Tyr. Trp. 



Most deletions and additions, and substitutions according to the invention are those which do not p*'« : ;ce 
radical changes in the characteristics of the protein or peptide molecule. "Characteristics" is defined in a non-inclusive 
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manner to define both changes in secondary structure, e.g. a-helix or P-sheel, as well as changes in physiological 
activity, e.g.. in biological activity assays. However, when the exact effect of the substitution, deletion, or addition is 
to be confirmed, one skilled in the an will appreciate that the effect of at least one substitution, addition or deletion will 
be evaluated by at least one PPCA or pPPCA screening assay, such as. but not limited to. immunoassays or bioassays. 
5 to confirm at least one PPCA or pPPCA biological activity. 

Surprisingly, a PPCA and/ or a pPPCA is now discovered to have serine carboxypeptidase activity and 
corresponding structural features, although having only about 30% sequence identity to wheat and yeast serine 
carboxy peptidases. These carboxy peptidases are members of the hydrolase fold family (Liao et ai. Biochemistry 
3/ -9796-98 12(1 992): Endrizzi et at. Biochemistry 3 3. Ill 06- II 1 20 ( 1 994). Ollis et ai. Protein Eng. 5: 1 97-2 11(1 992)). 

10 The serine carboxypeptidases have peptidase activity at acidic pH ( pH 4.5-5.5) as well as deamidase and esterase 
activities at pH 7 (reviewed in Breddam etal Carlsberg Res. Commun. 57:83-128 (1986); Raw lings & Barren. Methods 
in Enzymology. 244\ 19-61 (1994)). Mutagenesis studies and enzymatic assays have revealed that only the mature form 
of PPCA possesses a serine carboxypeptidase activity, which is similar to that of lysosomal cathepsin A. and has a 
preference for hydrophobic substrates such as the dipeptide Phe-Ala (Galjart et aL. J. Bioi Chem. 266:14754-14762 

15 ( 1 99 1 )). On the basis of sequence alignments with members of the serine carboxypeptidase family, mutagenesis studies 
and the structure determination of pPPCA, the catalytic triad in PPCA has now been determined to be formed by the 
residues Ser 1 50, His 429 and Asp 372 

PPCA andpPPCA Expression for Isolation and Purification 

A nucleic acid sequence encoding a PPCA or a pPPCA (Galjart et ai. Cell. 54:755-764 (1988)) can be 
20 recombined with vector DNA in accordance with conventional techniques, including blunt-ended or staggered-ended 

termini for ligation, restriction enzyme digestion to provide appropriate termini, filling in of cohesive ends as 

appropriate, alkaline phosphatase treatment to avoid undesirable joining, and ligation with appropriate ligases. 

Techniques for such manipulations are disclosed, e.g., in Sam brook et ai. Molecular Cloning: A Laboratory Manual, 

Second edition. Cold Spring Harbor Laboratory. Cold Spring Harbor, NY (1989); and Ausubel et aL, Current Protocols 
25 in Molecular Biology, Wiley Interscience, N.Y., (1988- 1995) and are well known in the art. 

A nucleic acid molecule, such as DNA, is said to be "capable of expressing" a polypeptide if it contains 

nucleotide sequences which contain transcriptional and translational regulatory information and such sequences are 

"operably linked" to nucleotide sequences which encode the polypeptide. An operable linkage is a linkage in which the 

regulatory DNA sequences and the DNA sequence sought to be expressed are connected in such a way as to permit gene 
30 expression as a PPCA , pPPCA or fragment thereof, in recoverable amounts. The precise nature of the regulatory 

regions needed for gene expression can vary from organism to organism, as is well known in the analogous art. See, 

e.g., Sam brook, infra and Ausubel. infra. 

The invention accordingly encompasses the expression of a PPCA or a pPPCA. in either prokaryotic or 

eukaryotic cells, although eukaryotic expression is preferred. Preferred hosts are bacterial or eukaryotic hosts including 
35 bacteria, yeast, insects, fungi, bird and mammalian cells either in vivo, or in situ, or host cells of mammalian, insect, bird 

or yeast origin. It is preferred that the mammalian cell or tissue is of human, primate, hamster, rabbit, rodent, cow, pig, 

sheep, horse, goat, dog or cat origin, but any other mammalian cell can be used. 

Eukaryotic hosts can include yeast, insects, fungi, and mammalian cells either in vivo, or in tissue culture. 

Preferred eukaryotic hosts can also include, but are not limited to insect cells, mammalian cells either in vivo, or in tissue 
40 culture. Preferred mammalian cells include Xenopus oocytes, HeLa cells, cells of fibroblast origin such as VERO or 

CHO-K 1 . or cells of lymphoid origin and their derivatives. 

Mammalian cells provide post-translational modifications to protein molecules including correct folding or 

glycosytation at correct sites. Mammalian cells which can be useful as hosts include cells of fibroblast origin such a*. 

but not limited to. NIH 3T3. VERO or CHO. or cells of lymphoid origin, such as. but not limited to. the hybridoma 
4./ SP2/0-Agl4 or the murine myeloma P3-X63Ag8. hamster cell lines (e.g.. CHO-K1 and progenitors, e.g.. CHO- 
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DUXB1 1 ) and iheir derivatives. One preferred rype of mammalian cells are cells which are intended to replace the 
function of the genetically deficient cells in vivo. Neuronally derived cells are preferred for gene therapy of disorders 
of the nervous system. For a mammalian cell host many possible vector systems are available for the expression of 
at least one PPCA or pPPCA. A wide variety of transcriptional and translational regulatory sequences can be employed, 
depending upon the nature of the host. The transcriptional and translational regulatory signals can be derived from viral 
sources, such as. but not limited to. adenovirus, bovine papilloma virus. Simian virus, or the like, where the regulatory 
signals are associated with a particular gene which has a high level of expression. Alternatively, promoters from 
mammalian expression products, such as. but not limited to. actin. collagen, myosin, protein production. 

When live insects are to be used, silk moth caterpillars and baculoviral vectors are presently preferred hosts 
for large scale PPCA or pPPCA production according to the invention. Production of PPCA or pPPCA in insects can 
be achieved, for example, by infecting the insect host with a baculovirus engineered to express transmembrane 
polypeptide by methods known to those skilled in the related arts. See Ausubel infra. §§16.8-16.1 1. 

In a preferred embodiment, the introduced nucleotide sequence will be incorporated into a plasmid or viral 
vector capable of autonomous replication in the recipient host. Any of a wide variety of vectors can be employed for 
this purpose. See. e.g.. Ausubel et al % infra, §§ 1.5, 1.10.7.1,7.3,8.1.9.6,9.7, 13.4. 16.2. 16.6. and 16.8-16.11. Factors 
of importance in selecting a particular plasmid or viral vector include: the ease with which recipient cells that contain 
the vector can be recognized and selected from those recipient cells which do not contain the vector: the number of 
copies of the vector which are desired in a particular host: and whether it is desirable to be able to "shuttle" the vector 
between host cells of different species. 

Different host cells have characteristic and specific mechanisms for the translational and post-translational 
processing and modification {e.g., glycosylation, cleavage) of proteins. Appropriate cell lines or host systems can be 
chosen to ensure the desired modification and processing of the foreign protein expressed. For example, expression in 
a bacteria) system can be used to produce an unglycosylated core protein product. Expression in yeast will produce a 
glycosylated product. Expression in mammalian cells can be used to ensure "native" glycosylation of the heterologous 
PPCA or pPPCA. Furthermore, different vector/host expression systems can effect processing reactions such as 
proteolytic cleavages to different extents. 

As discussed above, expression of PPCA orpPPCA in eukaryotic hosts requires the use of eukaryotic regulatory 
regions. Such regions will, in general, include a promoter region sufficient to direct the initiation of RN A synthesis. 
See. e.g., Ausubel. infra: Sam brook, infra. 

Once the vector or nucleic acid molecule containing the construct(s) has been prepared for expression, the DNA 
construe t(s) can be introduced into an appropriate host cell by any of a variety of suitable means, i.e., transformation, 
transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate-precipitation, 
direct microinjection, and the like. After the introduction of the vector, recipient cells are grown in a selective medium, 
which selects for the growth of vector-containing cells. Expression of the cloned gene molecule(s) results in the 
production of a PPCA or pPPCA. This can take place in the transformed cells as such, or following the induction of 
these ceils to differentiate (for example, by administration of bromodeoxyuracil to neuroblastoma cells or the like). 

A PPCA or pPPCA. or fragments thereof, of this invention can be obtained by expression from recombinant 
DNA according to known methods. Alternatively, a PPCA or pPPCA can be purified from biological material. A PPCA 
or a pPPCA can be purified from different mammalian tissues (e.g.. human placenta, rat liver, mouse liver, pig kidney, 
bovine testes, bovine liver, and the like) of various genus and species. 

The PPCA or pPPCA can be isolated and purified in accordance with conventional method steps, such as 
extraction, precipitation, chromatography, affinity chromatography, electrophoresis, or the like. For example, cells 
expressing at least one PPCA or pPPCA in suitable levels can be collected by centrifugation. or with suitable buffers, 
lysed. and the protein isolated by column chromatography, for example, on DEAE-cellulose, phosphocellulose. 
polyribocytidylic acid-a£drose. hydroxyapatite or by electrophoresis or immunoprecipitation. Alternative!) . a pPPCA 
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or PPCA can be isolated by the use of antibodies, such as. but not limited to. a PPCA- or pPPCA-specific antibody. Such 
antibodies can be obtained by known method steps (see. e.g.. Harlow and Lane ANTIBODIES: A LABORATORY 
MANUAL Cold Spring Harbor Laboratory (1988): CoJIigan et ai. eds.. Current Protocols m Immunology: Greene 
Publishing Assoc. and Wiley Interscience. N.Y.. (1992. 1993). the contents of which references are entirely incorporated 
herein by reference). 

A PPCA or a pPPCA can be purified from different mammalian tissues (e.g . human placenta, rat liver, mouse 
liver, pig kidney, bovine testes, bovine liver, and the like) of various genus and species, using known techniques such 
as gel filtration, phase separation and affinity chromatography, e.g.. using polyclonal or monoclonal antibodies specific 
for a PPCA or pPPCA, according to known methods. See., e.g . Oxender et ai. Protein Engineerings Liss. New York 
(1986). 

Overview of PPCA or pPPCA Purification and Crystallization Methods 

In general, a PPCA or pPPCA is isolated in soluble form in sufficient purity and concentration (e.g.. a monomer 
or dimer) for crystallization. The PPCA or pPPCA is then isolated and assayed for biological activity (e.g.. cathepsin 
A) and for lack of aggregation (which interferes with crystallization). The purified PPCA or pPPCA preferably runs 
as a single band for each monomer under reducing or nonreducing po ly aery lam ide gel electrophoresis (PAGE) 
(nonreducing is used to evaluate the presence of cysteine bridges). 

The purified PPCA or pPPCA is preferably crystallized under varying conditions of at least one of the 
following: pH. buffer type, buffer concentration, salt type, polymer type, polymer concentration, other precipitating 
ligands and concentration of purified PPCA or pPPCA. See, e g., known methods (Blundell et ai. Protein 
Crystallograpty. Academic Press, London (1976); Oxender, infra: McPherson, The Preparation and Analysts of Protein 
Crystals. Wiley Interscience, N.Y. (1982)) or methods provided in a commercial kit, such as CRYSTAL SCREEN 
(Hampton Research. Riverside, CA). The crystallized PPCA protein can optionally be tested for at least one PPCA 
activity and differently sized and shaped crystals are further tested for suitability for x-ray diffraction. Generally, larger 
crystals provide better crystallographic data than smaller crystals, and thicker crystals provide better crystal lographic 
data than thinner crystals. See. e.g.. Blundell, infra; Oxender, infra; McPherson, infra; Wyckoff et at. Diffraction 
Methods for Biological MacromoieculesV oh. 1 14-1 15. Methods in Emymology, Academic Press. Orlando, FL (1985). 
Protein Crystallization Methods 

The hanging drop method is preferably used to crystallize the purified protein. See. e g , Blundell, infra; 
Oxender, infra; McPherson. infra; Wyckoff, mfra; Taylor et ai. J. Mol. Biol. 226: 1287-1290 ( 1992); Takimoto et ai 
(1992). infra; CRYSTAL SCREEN. Hampton Research. 

A mixture of the purified protein and precipitant can include the following: 

• pH (e.g., 7-9); 

• buffer type (e g. . tromethamine (TRIZMA), sodium azide fNaN,), phosphate, sodium, or cacodylate 
acetates, imidazole, Tris HCI, sodium hepes); 

• buffer concentration (e.g., 1-100 mM); 

• salt type (eg , sodium azide. calcium chloride, sodium citrate, magnesium chloride, ammonium 
acetate, ammonium sulfate, potassium phosphate, magnesium acetate, zinc acetate; calcium acetate) 

• polymer type and concentration: (e.g.. polyethylene glycol (PEG) 1-50%, type 400-10.000); 

• other additives (salts: potassium, sodium, tartrate, ammonium sulfate, sodium acetate, lithium sulfate, 
sodium formate, sodium citrate, magnesium formate, sodium phosphate, potassium phosphate: 
organics: 2-propanol: non-volatile: 2-methyl-2.4-pentanediol); 0-octyl glucoside and 

• concentration of purified PPCA or pPPCA {e.g.. 1 .0- 100 mg/mi). 
See. e.g.. CRYSTAL SCREEN. Hampton Research. 

A non-limiting example of such crystallization conditions is the following: 

• purified PPCA or pPPCA protein (e.g.. 5 mg/ml); 
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• (2) solutions in serial mixtures 

( 1 ) 40-80 mM TR1ZMA. 0.05-2.0 mM NaN„; 

(2) 2-30% Polyethylene glycol (PEG) 8000 buffered with 40-80 mM TRIZMA and 
0.05-2.0 mM NaN, 

• 0.05-0.5% p-ocryl glucoside; 

• at an overall pH of about 8.0*8.3. 

The above mixtures are used and screened by varying at least one of pH, buffer type, buffer concentration, 
precipitating salt type or additive or their concentrations, PEG type, PEG concentration, and protein concentration. 
Crystals ranging in size from 0.1-0.9 mm are formed in I -14 days. These crystals diffract x-rays to at least 10 A 
resolution, such as 0. 1 5- 1 0.0 A, or any range of value therein, such as 1.5, 1.6, 1 .7. 1 .8, 1 .9, 2.0, 2 1 22 *>3 24 2 5 
2.6, 2.7, 2.8, 2.9, 3.0, 3.1 , 3.2, 3.3. 3.4 or 3.5. with 3.5 A or higher being preferred for the highest resolution. In addition 
to diffraction patterns having this highest resolution, lower resolution, such as 25-3.5 A can also be used. See, e.g., 
Blundell. infra: Oxender, infra: McPherson, infra; Wyckoff. infra; 
Protein Crystals 

15 Crystals appear after 1-14 days and continue to grow on subsequent days. Some of the crystals can be 

optionally removed, washed, and assayed for biological activity (e.g . PPCA), which activity is preferred for using in 
further characterizations. Other washed crystals are preferably run on a gel and stained, and those that migrate in the 
same position as the purified PPCA or pPPCA are preferably used. From two to one hundred crystals are observed in 
one drop and crystal forms can occur, such as. but not limited to, orthorombic, bipyramidal. rhomboid, and cubic. Initial 
x-ray analyses indicate that such crystals diffract at moderately high to high resolution. When fewer crystals are 
produced in a drop, they can be much larger size, e.g., 0.4-0.9 mm. See. e.g., Blundell, infra: Oxender, mfra; 
McPherson, infra; Wyckoff, infra; 
PPCA and p PPCA X-ray Crystallography Methods 

The crystals so produced for a PPCA or pPPCA are x-ray analyzed using a suitable x-ray source. Diffraction 
25 patterns are obtained. Crystals are preferably stable for at least 10 hrs in the x-ray beam . Frozen crystals (e.g., -220 
to -50°C) are optionally used for longer x-ray exposures (e.g.. 5-72 hrs). the crystals being relatively more stable to the 
x-rays in the frozen state. To collect the maximum number of useful reflections, multiple frames are optionally collected 
as the crystal is rotated in the x-ray beam, e.g., for 5-72 hrs. Larger crystals (>0.2 mm) are preferred, to increase the 
resolution of the x-ray diffraction patterns obtained. Crystals are preferably analyzed using a synchrotron high energy 
30 x-ray source. Using frozen crystals, x-ray diffraction data is collected on crystals that diffract to at least a relatively high 
resolution of 10-1.5 A, with lower resolutions also useful, such as 25-10A. sufficient to solve the three-dimensional 
structure of a PPCA or pPPCA in considerable detail, as presented herein. 

Passing an x-ray beam through a crystal produces a diffraction pattern as a result of the x-rays interacting and 
being scattered by the contents of the crystal. The diffraction pattern can be visualized using, e.g.. an image plate or 
35 film, resulting in an image with spots corresponding to the diffracted x-rays. The positions of the spots in the diffraction 
pattern are used to determine parameters intrinsic to the crystal (such as unicell parameters) and to gain information on 
the packing of the molecules in the crystal. The intensity of the spots contains the Fourier transformation of the 
molecules in the crystal, i.e.. information on each atom in the crystal and hence of the crystallized molecule 

After data collection of diffraction patterns, the data is processed. This includes measuring the spots on each 
40 diffraction pattern in terms of position and intensity. This information is processed (i.e.. mathematical operations are 
performed on the data (such as scaling, merging and convening the data from intensity of diffracted beams to 
amplitudes)) to yield a set of data which is in a form as can be used for the further structure determination of the 
molecule crystallized. The amplitudes of the diffracted x-rays are then combined with calculated phases to produce an 
electron density map of the contents of the crystal. In this electron density map. the structure of the molecules (as 
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present in the crystal) is built. The phases can be determined with various known techniques, one being molecular 
replacement. 

For the molecular replacement technique one takes a known three dimensional structure thought to share 
structural homology with the structure to be determined, to generate after calculations a first set of initial phases. These 
5 phases are then combined with the diffraction information of the molecule for which you want to solve the structure of. 
The result is an electron density map of the molecules in the crystal from which the diffraction patterns originate. 

The phases can be further optimized using a technique called density modification, which allows electron 
density maps of better quality to be produced facilitating interpretation and model building therein. The atomic model 
is then refined by allowing the atoms in the model to move in order to match the diffraction data as well as possible 

10 while continuing to satisfy stereochemical constraints (sensible bond lengths, bond angles and the like). See, eg 
Blundell, infra: Oxender, infra\ McPherson, infra; Wyckoff, infra; 
Computer Related Embodiments 

An amino acid sequence of a PPCA or pPPCA and/or atomic coordinate/x-ray diffraction data, useful for 
computer structure determination of a PPCA, pPPCA or a portion thereof, can be "provided" in a variety of mediums 

15 to facilitate use thereof. As used herein, provided refers to a manufacture, which contains a PPCA or pPPCA amino acid 
sequence and/or atomic coordinate/x-ray diffraction data of the present invention, eg., the amino sequence provided 
in Figures 13-15. a representative fragment thereof, or an amino acid sequence having at least 80-100% overall identity 
to a 5-542 amino acid fragment of an amino acid sequence of Figures 13-15. Such a method provides the amino acid 
sequence and/or atomic coordinate/x-ray diffraction data in a form which allows a skilled artisan to analyze and 

20 determine the three- dimensional structure of a PPCA, a pPPCA or a subdomain thereof. 

In one application of this embodiment, PPCA, pPPCA, or at least one subdomain thereof, amino acid sequence 
and/or atomic coordinate/x-ray diffraction data of the present invention is recorded on computer readable media. As 
used herein, "computer readable media" refers to any medium which can be read and accessed directly by a computer. 
Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and 

25 magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; 
and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any 
of the presently known computer readable media can be used to create a manufacture comprising computer readable 
medium having recorded thereon an amino acid sequence anaVor atomic coordinate/x-ray diffraction data of the present 
invention. 

30 As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled 

artisan can readily adopt any of the presently known methods for recording information on computer readable medium 
to generate manufactures comprising an amino acid sequence and/or atomic coordinate/x-ray diffraction data information 
of the present invention. 

A variety of data storage structures are available to a skilled artisan for creating a computer readable medium 
35 having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention. 
The choice of the data storage structure will generally be based on the means chosen to access the stored information. 
In addition, a variety of data processor programs and formats can be used to store the sequence and x-ray data 
information of the present invention on computer readable medium. The sequence information can be represented in 
a word processing text file, formatted in commercially-available software such as WordPerfect and MICROSOFT Word. 
40 or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase. Oracle, or the like. 
A skilled artisan can readily adapt any number of dataprocessor structuring formats (e.g. text file or database) in order 
to obtain computer readable medium having recorded thereon the information of the present invention. 

By providing on computer readable media having stored therein a PPCA or pPPCA sequence and/or atomic 
coordinates based on x-ray diffraction data, a skilled artisan can routinely access the sequence and atomic coordinate 
45 or x-ray diffraction data to model a PPCA. pPPCA. a subdomain thereof, or a ligand thereof Computer algorithms are 
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publiciy and commercially available which allow a skilled artisan to access this data provided on a computer readable 
medium and analyze it for structure determination and/or RDD. See. eg.. Biotechnology Software Directory. Mary Ann 
Lieben PubL New York ( 1 995). 

The present invention further provides systems, particularly computer-based systems, which contain the 
sequence and/or diffraction data described herein. Such systems are designed to do structure determination and RDD 
for a PPCA. pPPCA or at least one subdomain thereof. Non-limiting examples are microcomputer workstations 
available from Silicon Graphics Incorporated and Sun Microsystems running Unix based. Windows NT or IBM OS/2 
operating systems. 

As used herein, "a computer-based system" refers to the hardware means, software means, and data storage 
means used to analyze the sequence and/or atomic coordinate/x-ray diffraction data of the present invention. The 
minimum hardware means of the computer-based systems of the present invention comprises a central processing unit 
(CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate which of the 
currently available computer-based system are suitable for use in the present invention. A monitor is optionally provided 
to visualize structure data. 

15 As staled abovc - the computer-based systems of the present invention comprise a data storage means having 

stored therein a PPCA, pPPCA or fragment sequence and/or atomic coordinate/x-ray diffraction data of the present 
invention and the necessary hardware means and software means for supporting and implementing an analysis means 
As used herein, "data storage means'* refers to memory which can store sequence or atomic coordinate/x-ray diffraction 
data of the present invention, or a memory access means which can access manufactures having recorded thereon the 

20 sequence or x-ray data of the present invention. 

As used herein, "search means" or "analysis means" refers to one or more programs which are implemented 
on the computer-based system to compare a target sequence or target structural motif with the sequence or x-ray data 
stored within the data storage means. Search means are used to identify fragments or regions of a PPCA or pPPCA 
which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a 

25 variety of commercially available software for conducting search means are and can be used in the computer-based 
systems of the present invention. A skilled artisan can readily recognize that any one of the available algorithms or 
implementing software packages for conducting computer analyses that can be adapted for use in the present computer- 
based systems. 

As used herein, "a target structural motif," or "target motif." refers to any rationally selected sequence or 
combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration or electron 
density map which is formed upon the folding of the target motif. There are a variety of target motifs known in the art. 
Protein target motifs include, but are not limited to, enzymic active sites, structural subdomains. epitopes, functional 
domains and signal sequences. A variety of structural formats for the input and output means can be used to input and 
output the information in the computer-based systems of the present invention. 

A variety of comparing means can be used to compare a target sequence or target motif with the data storace 
means to identify structural motifs or interpret electron density maps derived in part from the atomic coordinate/x-ra> 
diffraction data. A skilled artisan can readily recognize that any one of the publicly available computer modeling 
programs can be used as the search means for the computer-based systems of the present invention. 

One application of this embodiment is provided in Figure 22. Figure 22 provides a block diagram of a 
computer system 102 that can be used to implement the present invention. The computer system 102 includes a 
processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented 
as random access memory. RAM) and a variety of secondary storace memory I 10. such as a hard drive 1 12. a removable 
storage medium 1 14. and a monitor 120. The removable medium storage device 1 14 may represent, for example, a 
floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 1 16 (such as a floppy 
disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into 
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the removable medium storage medium 1 14. The computer system 102 includes appropriate software for reading the 
control logic and/or the data from the removable medium storage device 1 14 once inserted in the removable medium 
storage device 1 14. 

Amino acid, encoding nucleotide or other sequence and/or atomic coord in ate/x- ray diffraction data of the 
present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 
1 1 0. and/or a removable storage device ! 16. Software for accessing and processing the amino acid sequence and/or 
atomic coordinate/x-ray diffraction data (such as search tools, comparing tools, etc.) reside in main memory 108 during 
execution. The monitor 120 is optionally used to visualize the structure daia. 
Structure Determination 

One or more computational steps, computer programs and/or computer algorithms are used to build a molecular 
3-D model of a PPCA or pPPCA. using amino acid sequence data from Figures 13-15 (or variants thereof) and/or atomic 
coordinate/x-ray diffraction data, as presented herein. 

In x-ray crystallography, x-ray diffraction data and phases are combined to produce electron density maps in 
which the three-dimensional structure of a PPCA or pPPCA is then built or modeled. This structure can then be used 
for RDD of modulators of at least one PPCA- or pPPCA-reiated activity that is relevant to at least one PPCA- or 
pPPCA-related pathology. 

Density Modification and Map Interpretation Electron density maps can be calculated using such programs 
as those from the CCP4 computing package (SERC (UK) Collaborative Computing Project 4. Daresbury Laboratory, 
UK, 1979). Cycles of two- fold averaging can further be used, such as with the program RAVE (Kleywegt &. Jones, 
Bailey et at., eds.. First Map to Ftnai Model. SERC Daresbury Laboratory, UK, pp 59-66 (1994)) and gradual model 
expansion. For map visualization and model building a program such as "CT (Jones ( 1 99 1 ), infra) can be used. 

Refinement and Model Validation. Rigid body and positional refinement can be carried out using a program 
such as X-PLOR (Brunger (1992), infra\ e.g., with the stereochemical parameters of Engh and Huber (Acta Cryst. 
/^7:392-400 (1991)). If the model at this stage in the averaged maps still misses residues (e.g.. at least 5-10 per 
subunit), the some or all of the missing residues can be incorporated in the model during additional cycles of positional 
refinement and model building. The refinement procedure can start using data from lower resolution (eg.. 25-I0A to 
10-3.0 A and then gradually extended to include data from 12-6A to 3.0-1.5 A. B-values (also termed temperature 
factors) for individual atoms can be refined once data of 2.8A or higher (e g., up to 1 .5 A) has been added. Subsequently 
waters can be gradually added. A program such as ARP (Lamzin and Wilson, Acta Cryst. D49: 129-147 ( 1 993)) can be 
used to add crystallography waters and as a tool to check for bad areas in the model. Programs such as PROCHECK 
fLackowski<?/fl/..y Appl. Cryst. 25:283-291 (1993)), WHATIF (Vriend. J. Mol. Graph. 5:52-56 ( 1990)) and PROFILE 
3D (Lothy ei ai. Nature J5<5:83-85 (1992)), as well as the geometrical analysis generated by X-PLOR can be been used 
to check the structure for errors. A program such as DSSP can be used to assign the secondary structure elements 
(Kabsch and Sander (1983), infra). 

The structure of a PPCA or pPPCA can thus be solved with the molecular replacement procedure such as by 
using X-PLOR (Brtinger (1992), infra). A partial search model for the monomer can be constructed using a related 
protein, such as wheat serine carboxypeptidase structure (Liao et a! ( 1 992). infra). The rotation and translation function 
can be solved to yield orientations and positions for the subunits in the crystallographic asymmetric unit. This allows 
phases to be determined that, when combined with information from the x-ray diffraction patterns, allows electron 
density maps of a PPCA or pPPCA to be calculated. The atomic model is then built using these electron density maps. 
Cyclical two-fold density averaging can also be done to improve the electron density maps using a suitable program 
(e.g.. RAVE) and model expansion can also be used to add missing residues for each monomer, resulting in a model with 
95-99.9% of the total number residues. The model can be refined in a program such as X-PLOR (Brunger (1992), 
supra), to a suitable crystallographic R,^. The model data is then saved on computer re^-.fale media for use in further 
analysis, such as rational drus desicn. 
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Rational Design of Drugs that Interact with the PPCA or pPPCA 

The determination of the three-dimensional structure of a PPCA or pPPCA. as described herein, provides a 
basis for the design of new and specific iigands for the diagnosis and/or treatment of at least one PPCA- or pPPCA- 
related pathology. 

Several approaches can be taken for the use of the crystal structure of a PPCA or pPPCA in the rational design 
of Iigands of this protein. A computer-assisted, manual examination of the active site structure is optionally done. The 
use of software such as GRID ( Goodford, J. Med Chem. 25:849-857 (1985)) a program that determines probable 

interaction sites between probes with various functional group characteristics and the enzyme surface is used to 

analyze the active site to determine structures of inhibiting compounds. The program calculations, with suitable 
inhibiting groups on molecules (e.g., protonated primary amines) as the probe, are used to identify potential hotspots 
around accessible positions at suitable energy contour levels. Suitable Iigands. as inhibiting or stimulating modulating 
compounds or compositions, are then tested for modulating activities of at least one PPCA or pPPCA 

A diagnostic or therapeutic PPCA or pPPCA modulating ligand of the present invention can be. but is not 
limited to. at least one selected from a nucleic acid, a compound, a protein, an element, a lipid, an antibody, a saccharide, 
15 an isotope, a carbohydrate, an imaging agent, a lipoprotein, a glycoprotein, an enzyme, a detectable probe, and antibody 
or fragment thereof, or any combination thereof, which can be detectabty labeled as for labeling antibodies. Such labels 
include, but are not limited to. enzymatic labels, radioisotope or radioactive compounds or elements, fluorescent 
compounds or metals, chemiluminescem compounds and bioluminescent compounds. Alternatively, any other known 
diagnostic or therapeutic agent can be used in a method of the invention. 

After preliminary experiments are done to determine the K m of the substrate with each enzvme activity of a 

— r 

PPCA or pPPCA. the time-dependent nature of modulation of ligand K, values are determined, (eg., by the method of 
Henderson (Biochem J. 727:321-333 (1972)). For example, the substrate (or blank where appropriate) and enzyme 
are pre-incubated in buffer. Reactions are initiated by the addition of substrate. Aliquots are removed over a suitable 
time course and each quenched by addition into the aliquots of suitable quenching solution (e.g., sodium hydroxide in 
25 aqueous ethanol). The concentration of product is determined, e.g., fluorometrically, using a spectrometer . Plots of 
fluorescence against time can be close to linear over the assay period, and are used to obtain values for the initial velocity 
in the presence (V,) or absence (V 0 ) of ligand. Error is present in both axes in a Henderson plot, making it inappropriate 
for standard regression analysis (Leatherbarrow, Trends Biochem. Set. 7.5:455-458 (1990)). Therefore, K, values are 
obtained from the data by fining to a modified version of the Henderson equation for competitive inhibition: 



20 



30 



Qr 2 + (£ - Q - I)r - E = 0 
where (using the notation of Henderson (Biochem. J. 727:321-333 (1972)): 



V 

r = • 
V 



and ° 



This equation is solved for the positive root with the constraint that 

0= K,((A, + K.)/K.) 

35 using PROCNLIN from SAS (SAS Institute Inc.. Gary. North Carolina. USA) which performs nonlinear regression usina 
least-square techniques. The iterative method used is optionally the multivariate secant method, similar to the Gauss- 
Newton method, except that the derivatives in the Taylor series are estimated from the histogram of iterations rather than 
supplied analytically. A suitable convergence criterion is optionally used. e.g.. where there is a change in loss function 
of less than 101 
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Once modulating ligands are found and isolated or synthesized, crystallographic studies of the compounds 
complexed to a PPCA or pPPCA can be performed. As a non-limiting example. PPCA or pPPCA crystals are soaked 
for 2 days in 0.01-100 mM ligand and x-ray diffraction data are collected on an area detector and/or an image plate 
detector {e.g. . a Mar image plate detector) using a rotating anode x-ray source. Data are collected to as high a resolution 
as possible, eg., an inner limit of diffraction of 1.5-3. 5A. An atomic model of the inhibitor is built into the difference 
Fourier map (F %tMmat -F RW(V< ). The model can be refined to adjust the atomic positions to improve the fit with the 
electron density maps, while maintaining correct stereochemical constraints. The model will preferably have low r.m.s. 
deviations from the ideal bond lengths, as well as for the angles, respectively, as well as a low R-facior (preferably less 
than about 25-35%. such as less than about 35, 34, 33, 32, 3 1. 30, 29. 28, 27. 26, or 25%. 

Direct measurements of enzyme inhibition provide further confirmation that the modeled ligands are 
modulators of at least one biological activity of a PPCA or a pPPCA . As a non-limiting example, a modification (Chong 
etai.,Biochtm Biophys. Acta 1 077:65-1 I (199 1)) of the fluorometric assay of Potier (et a/., Analyt. Biochem. 94:2$1- 
296 (1979)) is optionally used to measure neuraminidase inhibition or stimulation, optionally including determination 
of inhibition constants <*,). Other suitable PPCA activity assay include, e.g.. cathepsin A activity (Galjart et ai. 1 Biol. 
Chem. 266: 14754- 14762 (1991); Endothelin I deamidase activity (Jackman, et ai. J. Biol. Chem. 267:2872-2875(1992); 
and tachykinin deamidase activity (Jackman. et ai.. J. Biol. Chem. 265:11265-11272(1990)). 

Ligands of a PPCA or pPPCA. based on the crystal structure of this enzyme, are thus also provided by the 
present invention. A PPCA or pPPCA ligand is any molecule, compound or composition that is capable of associating 
with a PPCA or pPPCA and optionally modulating at least one function or structural feature of a PPCA or pPPCA. 
Preferably, a PPCA or pPPCA ligand modulates at least one biological activity of a PPCA or pPPCA. Demonstration 
of clinically useful levels, e.g.. in vivo activity is also important, in evaluating PPCA or pPPCA inhibitors for biological 
activity in animal models (e.g., rat, mouse, rabbit) using various oral and parenteral routes of administration are 
evaluated. Using this approach, it is expected that modulation of a PPCA or pPPCA occurs in suitable animal models, 
using the ligands discovered by structure determination and x-ray crystallography. 
Evaluation of Therapeutic Potentials of Compositions via a PPCA Animal Model 

The present invention also provides methods for identifying diagnostic or therapeutic ligands of PPCA or 
pPPCA via computer RDD, to treat a PPCA-related pathology. Generally, a method for determining the therapeutic or 
diagnostic use of a PPCA or pPPCA modulating ligand, to treat a PPCA related pathology, comprises the steps of 
administering a known dose of at least one ligand containing compositions to an animal model having a phenorype 
corresponding to a PPCA-related pathology, monitoring the appropriate biological or biochemical parameters, and 
comparing the results with treated animals to those of untreated animals. Results indicating the onset or presence of a 
PPCA related pathology are generally referred to herein as "symptoms" of the disease. See., e.g., U.S. Appl. No. 
08/397,693, filed March 2. 1995, which is entirely incorporated herein by reference. 

Appropriate biological and biochemical parameters that reflect the onset and progression of a PPCA related 
pathology include, but are not limited to, (I) gross biological parameters, e.g., physical appearance (i.e.. flattening of 
the face, rough haircoat and/or subcutaneous swelling in affected animals) or growth (reduced weight gain); (2) gross 
behavioral parameters, e.g., lack of coordination; (3) biochemical assays, e.g.. assays of cathepsin A. N-aceryl-a- 
neuraminidase or P-gaiactosidase activities in primary cultures of skin fibroblasts or tissue homogenates; (4) 
histopathological studies (visceromegaly, i.e.. enlarged liver and spleen: accumulation of secondary vacuoles in kidney 
tissues; etc.). 

A first method of evaluating the therapeutic potential of a composition using the transgenic non-human animals 
of the invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal having a 
phenotype corresponding to a human PPCA related pathology: 

(2) Detecting the time of onset of symptoms in the first non-human animal; and 
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(3 ) Comparing the time of onset of symptoms in the first non-human an imal to the time of onset 
of symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition; 
wherein a statistically significant delay in the time of onset of symptoms in the first non-human animal relative to the 
5 time of onset of the symptoms in the second non-human animal indicates the potential of the composition for treating 
a PPCA related pathology. ° 

A second method of evaluating the therapeutic potential of a composition using the non-human animals of the 
invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal having a 
10 phenotype corresponding to a human PPCA related pathology at an initial time, t©; 

(2) Determining the extent of symptoms in the first non-human animal at a later time, t,; and 

(3) Comparing, at t„ the extent of symptoms in the first non-human animal to the extent of 
symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition at t^ 

15 wherein a statistically significant decrease in the extent of symptoms at t, in the first non-human animal relative to the 
extent of the symptoms at t, in the second non-human animal indicates the potential of the composition for treating a 
PPCA related pathology. 

In the above methods, the composition being tested may comprise a chemical compound administered by 
circulatory injection or oral ingestion. The composition being evaluated may alternatively comprise a polypeptide 

20 administered by circulatory injection of an isolated or recombinant bacterium or virus that is live or attenuated, wherein 
the polypeptide is present on the surface of the bacterium or virus prior to injection, or a polypeptide administered by 
circulatory injection of an isolated or recombinant bacterium or virus capable of reproduction within a non-human 
animal, and the polypeptide is produced within a non-human animal by genetic expression of a DNA sequence encoding 
the polypeptide. Alternatively, the composition being evaluated may comprise one or more nucleic acids, including a 

25 gene from the human genome or a processed RNA transcript thereof. Similarly, the composition being evaluated mav 
comprise cells removed from a mammal and genetically engineered to overexpress a lysosomal protein or some other 
therapeutic polypeptide. 

Once the PPCA modulating ligand has been shown to be effective in an animal model, it can then be tested in 
human clinical trials, according to known method steps. 

30 ln abovc m«hods. delivery of the composition being tested to non-human animals is achieved via means 

appropriate for the composition being tested, e.g., by diet: by intermittent or continuous intravenous injection of one or 
more of the compositions or of a liposome (Rahman and Schein. in Liposomes as Drug Carriers. Gregoriadis, ed., John 
Wiley, New York (1988). pages 381-400; Gabizon. A., in Drug Carrier Systems, Vol. 9. Roerdink et aL eds.. John 
Wiley, New York (1989), pages 185-212) or micropanicle (Tice et aL. U.S. Patent 4.542,025 (Sep. 17, 1985)) 
formulation comprising one or more of the compositions; via subdermal implantation of drug-polymer conjugates 
(Duncan. R., Anti-Cancer Drugs i:175-210 (1992); via micropanicle bombardment (Sanford et aL U.S. Patent 
4.945,050 (Jul. 31. 1990)); via infusion pumps (Biackshear and Rohde, in Drug Carrier Systems. Vol. 9, Roerdink et 
aL eds.. John Wiley. New York (1989), pages 293-3 10) or by other appropriate means known in the an (see, generally, 
Remington's Pharmaceutical Sciences. 18th Ed.. Gennaro. ed.. Mack Publishing Co.. Easton. PA (1990)). 
40 Pharmaceutical/Diagnostic Administration 

Using compounds or compositions comprising at least one PPCA or PPCA modulating ligand. the present 
invention further provides a method for modulating the activity of a PPCA or pPPCA protein in a cell. In ceneral. 
Iigands (antagonists or agonists) which have been identified to inhibit or enhance the activity of at least one PPCA or 
pPPCA ligand can be formulated so that the ligand can be contacted with a cell expressing at least one PPCA or pPPCA 



35 



WO 97/15588 



-21- 



PCT/US96/I7325 



protein in v,vo. The contacting of such a cell with such a ligand results in the in vivo modulation of at least one 
biological activity of a PPCA or pPPCA. 

At least one PPCA or pPPCA modulating compound or composition of the invention can be administered by 
any means that achieve the intended purpose, using a suitable pharmaceutical composition or formulation. For example, 
administration can be by various parenteral routes such as subcutaneous, intravenous, intradermal, intramuscular, 
inrapentoneal. intranasal, intracranial, transdermal, or buccal routes. Alternatively, or concurrently, administration can 
be bxgjjp oral ro^ite. Parentei^adminisnttion can be by bolus inject** or by^gradual perfusion over time. 
^r 7 ^* «yp^«" regimen.for treatment onprophylaxis comprises administration of an effective amount-over a period 
ofene or several daysfup to and including between one week and about six months. It is understood that the dosage, 
^gnostic/pharmaceutical compound or composition of the invention administered in vivo or in vitro will be" 
Indent upon the age. sex. health, and weight of the recipient, kind of concurrent treatment, if any. frequency of 
treatment, and the nature of the diagnostic/ pharmaceutical effect desired. The ranges of effective doses provided herein 
are not intended to be limiting and represent preferred dose ranges. However, the most preferred dosage will be tailored 
to the individual subject, as is understood and determinable by one skilled in the relevant arts. See, e.g., Berkow et al., 
eds.. The Merck Manual, 16th edition, Merck and Co.. Rahway, N.J.. 1992: Goodman e, al., eds.. Goodman and 
Gtlman's The Pharmacological Basis of Therapeutics. 8th edition. Pergamon Press. Inc.. Eimsford. N. Y., ( 1 990): Avery's 
Drug Treatment: Principles and Practice of Clinical Pharmacology and Therapeutics. 3rd edition. ADIS Press. LTD.. 
WilUams and Wilkins. Baltimore. MD. (1 987). Ebadi. Pharmacology. Little. Brown and Co.. Boston. ( 1 985): Osol et al'. 
eds.. Remington's Pharmaceutical Sciences, 18th edition. Mack Publishing Co.. Easton. PA (1990): Katzung. Basic and 
Clinical Pharmacology. Appleton and Lange. Norwalk. CT (1992). which references are entirely incorporated herein 
by reference. 

The total dose required for each treatment can be administered by multiple doses or in a single dose. The 
diagnostic/pharmaceutical compound or composition can be administered alone or in conjunction with other diagnostics 
and/or pharmaceuticals directed to the pathology, or directed to other symptoms of the pathology. Effective amounts 
of a diagnostic/pharmaceutical compound or composition of the invention are from about 0. 1 ug to about 1 00 mg/kg 
body weight, administered at intervals of 4-72 hours, for a period of 2 hours to I year, and/or any range or value therein. 

The recipients of administration of compounds and/or compositions of the invention can be any mammals. 
Among mammals, the preferred recipients are mammals of the Orders Pnmata (including humans, apes and monkeys), 
Aneriodactyla (including horses, goats, cows, sheep, pigs). Rodenta (includ.ng mice. rats, rabbits, and hamsters), and 
Carnivore (including cats, and dogs). The most preferred recipients are humans. 

Having now generally described the invention, the same will be more readily understood through reference 
to the following example which is provided by way of illustration, and is not intended to be limiting of the present 
invention. 

Example 1: Preparation, Purification and Crystallization of PPCA or pPPCA from Human 

Cells 

The present invention provides, in one aspect, the determination of the three-dimensional structure of the human 
protective protein/cathepsin A (PPCA) in the precursor form (pPPCA) by a combination of molecular replacement and 
twofold density averaging. The structure presented here is the f.rst of an enzyme associated with a human PPCA related 
pathology, and the third human lysosomal enzyme structure determined. The structure gives us insieht into the zvmo-en 
activation mechanism of pPPCA . as well as the expected 3-D structure of PPCA and its specific and new enzvmatic 
activities. 

PPCA and pP PC A Expression and Purification 

Plasmid Constructs. AcMNPV transfer-plasmids pjR2 and pBC3 (Figure I ) were derivatives of plasmid 
oAc373. carrying the entire polyhedrin gene (Smith et al.. 1985). In P JR2 a polylinker with a number of multiple 
cloning sites (MCS) was inserted directly 3 1 of the polyhedrin promoter, and substituted a 33-nucleotide deletion of the 
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polyhcdrin gene, starting with the ATG. pBC3 had the polylinkcr situated in a similar position as pJR2. but instead of 
the 33-nt deletion this plasmid featured an ATG codon mutated in ACG Full-length human PPCA cDNA. PPCA54 
(Galjan e/ a/., 1988). and the two deletion cDNA mutants. 32( A 20) and 20U32) (Galjan et a?.. 1991). were subcloned 
either in pJR2 or pBC3 as EcoRl fragments, using standard procedures (Sambrook et a/., 1989). (Figure 1). The 
20U32) deletion mutant was tagged with the human PPCA signal sequence, as reported earlier (Gaijan et aL. 1991 ). 
All cDNA fragments were engineered to have short 3* and 5' untranslated regions (< 10 bp). 

Transfection and Selection of Recombinant Bacutovirus. Spodoptera frugiperda insect cells (IPLB-SF2 1 ) 
were cultured in monolayers at 27°C in TNM-FH medium (Hink. 1970). supplemented with 10% FBS and antibiotics 
(complete medium). Wild-type (wt) AcMNPV virus strain E2 (Smith and Summers. 1978) and recombinant 
baculoviruses were propagated on confluent monolayers of Sf2l cells. Recombinant constructs AcPPCA54. AcPPCA32 
and AcPPCA20 were generated by cotransfecting Sf2l cells with 1 ug wt-AcMNPV DNA and 10 ug plasmid DNA. 
using the calcium phosphate method, modified for insect cells (Graham et at ., 1973; Carstens et al., 1980: Summers et 
al> 1987). Recombinant polyhedrin-negative recombinant baculoviruses were then selected and purified by sequential 
plaque assays, and verified by dot blot and southern blot analysis (Summers et ai. 1987). Large quantities of inoculum 
15 were produced by infection of insect cells at 25-50 % confluency. with recombinant virus at a multiplicity of infection 
(MOD of < 1 pfu/cell. After 3 to 6 days at 27*C. when all cells appeared infected, the medium was harvested and 
centrifuged for 5 m at 1000 rpm to remove detached cells. The litre of the inoculum was determined by plaque assay 
analysis. 

Protein purification and western blotting. Sf2l cells were cultured in either 175 CM : or 500 CM 2 flasks (triple 
20 flask, Nunc) to near confluency, and infected with recombinant baculoviruses at a MOI of 5- 1 0 pfu/cell. After 1 .5 h 
incubation at 27 °C. the inoculum was replaced with complete medium for additional 8 to 10 hrs. Cell monolayers were 
then rinsed with PBS and cultured further for 38 h in unsupplemented Grace s medium. After infection the medium was 
collected, centrifuged for 5 m at 1500 g, and for 1 h at 100.000 g (Beckmann SW-28 rotor) to remove virus particles. 
After centrifugation the supernatant was concentrated 20-fold, in an Amicon stirred cell. Glycoproteins were purified 
25 -60% using a concanavalin A-SEPHAROSE affinity chromatography column, as described earlier (Verheijen et ai, 
1982). Total protein concentration was measured using the method of Smith et ai.* (J 985). Aliquots of the purified 
preparation were resolved on 12.5% SDS-poly aery lam ide gels under reducing and non-reducing conditions. Gels were 
either Coomassie brilliant blue- or silver stained (Sambrook et a/., 1 989). For western blotting, proteins were transferred 
from gels to IMMOBILON PVDV membranes (Millipore Corp.). using a semidry blotter (The W.E.P. company). 
30 Development and Use of p PPCA antibodies. A 15 amino acid peptide (NH r Cys-Met-Trp-His-Gln-Ala-Leu- 

Leu-Arg-Ser-Glu-Asp-Lys-Ala-Arg-COOH) (Figure 5). based on the C-terminal sequence of the 34-kDa PPCA subunit 
(amino acid 285-298, Galjan et ai, 1988), was synthesized on a peptide synthesizer (Applied Biosystems). and 
covalently linked to the carrier protein Keyhole Limpet Hemocyanin. using the IMJECT ACTIVATED IMMUNOGEN 
CONJUGATION KIT (Pierce). Polyclonal antibodies against the conjugated product were raised in rabbit, by multiple 
35 subdermal injections of the protein (40-125 ug) mixed with incomplete Freunds adjuvant (Pierce). Rabbits were bled 
34 days after the first injection. The antibodies, designated anti-pep, were tested on immunoblots and bv 
immunoprecipitations of baculovirus produced PPCA. 

Blots were incubated for at least 12 h in blocking buffer (0.01 M tris-buffered saline pH 8.0 (TBS). 0.05% 
Tween 20. and 3% (w/v BSA). and subsequently probed for 2 h with polyclonal PPCA antibodies, anti-54. diluted 1 :200 
in fresh blocking buffer. They were then washed for 1 h in TBS. 0.05% Tween 20. and incubated for 2 h with alkaline 
phosphatase conjugate anti-rabbit IgG (Sigma. 1:1000 in blocking buffer). Proteins were visualized using alkaline 
phosphatase substrate (Sigma, 4-aminodiphenylamine diazonium sulfate, naphtol as-mx phosphate). 

Crystallization of PPCA. Fractions containing the precursor form of the protein as assayed on an SDS-PAGE 
gel were pooled. Subsequently the protein was concentrated to 5 mg/ml and the buffer exchanged to 50 mM NaAc pH 
45 5.2 or 50 mM MES pH 6.5 -.me a CENTRICON-I 0. Crystals were crown using the hanging drop vapor diffusion 
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technique. Crystals suitable for data collection were grown using a reservoir solution containing : 2-10 % PEG 8000. 
pH 8.0 - 8.3. 50mM TRI2MA. ImM NaN„ 0.25 % ^-octyl glucoside at 4-I2°C. Mixing non-equal volumes of protein 
solution (in the range 5-10ul) and reservoir solution ( in the range 2-6 W) enhanced the occurrence of single large 
crystals per drop under these crystallization conditions. The concentration of the protein solution before mixine was 
5 mg/ml. Crystal growth was enhanced by macrocrystallization techniques (anything that promotes growth of big 
crystals) and in some cases by micro- and macroseeding techniques. 

Example 2: Structure Determination of a pPPCA Crystallized from Human Cells 

Data Collection. Data Processing and Reduction. 

To allow for data collection at cryotemperatures, the crystals were cryoprotected by adding glycerol in 5% -10% 
steps to a solution of about 12% PEG 8000, 50 mM TRIZMA, pH 8.0, ImM NaN„ 0.25% fi-ocryl glucoside, which 
served as an artificial mother liquor. The crystals were incubated for half an hour at 40 °C after each addition of 
glycerol. The final mother liquor contained 30% glycerol. Gradually increasing the glycerol was needed to help keep 
the crystals from cracking. 

Diffraction data was collected at the Stanford Synchrotron Radiation Laboratories (SSRL) to 2.0 A at -178°C 
on a MAR imaging plate at a wavelength of 1 .08 A on beam-line 7- 1 . The diffraction coordinate data (corresponding 
to atomic coordinates monomer I. the other monomer coordinates are provided by matrix conversion of these 
coordinates, as presented herein) was processed and reduced using MOSFLM version 5.2 from the CCP4 proeram 
package (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory UK, 1979). The program REF1X 
(Kabsch (1993), infra) was used for auto-indexing. Using the CCP4 program suite (SERC (UK) Collaborative 
Computing Project 4, Daresbury Laboratory UK, 1979), the intensities were scaled (ROTAVATA), merged 
(AGROVATA) then converted to amplitudes and truncated with the program TRUNCATE. Statistics of the data 
collected are given in Table 1. The V m (Matthews, B.W., J. Moi Biol 33:49 1-497 (1968)) is 3.2 A J /Da for 2 monomers 
in the asymmetric unit, corresponding to a solvent content of 62%. 
Molecular Replacement 

Search Model: The best molecular replacement results were obtained using a multi-Ala core as a search probe. 
The multi-Ala core' search model was constructed from the atomic coordinates of the CPW monomer (Liao et al., 1992), 
based on the sequence alignment as presented in Figure 15. Regions expected to deviate in structure between PPCA and 
CPW were deleted from the model (i.e. with low sequence identity or located in loops). The 125 residues identical in 
PPCA and CPW were left in the model; 1 12 residues were truncated to alanine. The remaining 94 residues through 
differing between CPW and PPCA, were considered sufficiently similar in size and the CPW residue left as such in the 
model. The resulting 'multi-Ala core' monomer consisted of 33 1 residues, constituting a large portion of the core domain 
and little atomic information for the 'cap* domain (see Figure 1). The model contained 30% of the expected protein 
scattering mass given the fact that there are two monomers in the asymmetric unit. The sequence identity between this 
search model and the true PPCA structure was 37.7%. 

Rotation Function, PC Refinement and Translation Function: Native data of 8 - 4 A was used in the 
molecular replacement calculations. The rotational searches utilized a real space Patterson search method, as 
implemented in X-PLOR (Steigeman, 1974; Huber, 1985. Brunger 1992a) with a Patterson vector cutoff of 21 A. The 
self-rotation function failed to reveal any non-crystal lographic two-fold symmetry relating two monomers in the 
asymmetric unit. In addition, the native self Pattersons did not reveal the presence of a non-crystallographic two-fold 
axis parallel to a crystal lographic axis. These results indicated that the two monomers in the asymmetric unit micht not 
form a dimer together. The cross-rotation function was carried to find the orientation of the two monomers in the 
asymmetric unit as follows. Patterson vector sets were calculated for the search model and the native data and the 8000 
strongest Patterson vectors were used in the rotation function. The rotational space restricted to the asymmetric unit of 
the rotation function according to Rao et al., 1980. was sampled by rotating the Patterson vectors from the search model 
around Eulerian angles 61. 62. and 63. while sampling 62 in angular grid intervals of 2.5°. The 5000 highest rotation 
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function grid points were selected resulting from the product function of the two Patterson vector sets. The grid points 
(differing less than 8° around any given axis) were then clustered. The result was a list of 169 possible solutions for 
the rotation function, each corresponding to a set of three angles describing an orientation. The two top solutions were 
3.9 and 3.8 sigma above the mean. PC-refmement (Briinger. 1990) was carried out to optimize each of the 169 possible 
solutions using the complete search model as a single rigid body. This yielded two orientations with a PC-index of 0.043 
and 0.051 respectively. The orientations of these solutions were (D K - 261.4, £>. « 36.22. D % = 147.28); and (0, * 
18.52. 0 : - 47.40. D y = 23.22), respectively. In contrast the rest of the possible solutions yielded an average PC-index 
of 0.022. 

Individual translation function calculations were performed on a I A grid. A translational solution was found 
for each orientation at positions (x=33.30, y=51.97, and z-12.79) and <x=25.23, y=28.58, and 2=22.02), with respect 
to the crystallographic center, as 7.7 and 8.8o. respectively, above the mean. The R^, for the individual solutions was 
55.6% and 54.8% in the resolution range 8.0 to 4.0A, with a correlation coefficient (CC) of 0.095 and 0.1 14. A 
combined translation function was calculated to place each solution relative to the same crystallographic origin, resulting 
in an of 52.8% for data between 8.0 and 4.0A, bringing the R,.^ down to 51.3% and increasing the CC to 0.22. 
15 The molecular packing was assessed on a graphics workstation, which revealed no clashes between the placed search 
probes. However, a very large amount of empty space was present. The packing showed that the asymmetric unit 
contained two half dimers, each forming a dimer with another monomer in a neighboring unit cell. The two cores in 
the asymmetric unit were related by K=73° around an axis tilted 15.5° off the crystallographic a axis lying in the a.c 
plane. 

20 iterative Model Building and Two-fold A veraging 

Initial Electron Density Map: A 2m|F obf | -DIF^J SigmaA weighted map (Read, 1986) was calculated using 
|F^ e |'s and phases from the molecular replacement solution. The map was contoured at lo and showed good density 
for most of the core. Density emerged for many side chains where the input model residue had been an Ala, indicating 
that the molecular replacement solution was correct. 
25 First Model Built: The two rotated and translated search probes formed the starting point for model building 

of the PPCA precursor. The non-crystal lographic symmetry (NCS) matrix was determined between the two cores using 
the "Lsq_explicif option in the computer program O (Jones et a/., 1991 ). Subsequently a best monomer 1 was built by 
superimposing the electron densities from each monomer core, and adjusting the model accordingly. Residues were only 
incorporated in the model where the electron density was visible for the complete side chain. Residues from the search 
model for which no density was visible were removed. An alanine was built in the model at places where electron 
density for a side chain was partial. In this manner 294 residues, i.e. 65% of the C atoms were built in the 'best 
monomer' core. The second monomer was generated from the 'best monomer* model using the NCS operator relating 
the two monomers in the asymmetric unit. At this point the data set was partitioned in a working set and a test set 
consisting of 5% of the reflections between 8 - 2.2 A to monitor the R fw (Briinger et at. 1992b). The working data set 
35 was used for rigid body and positional refinement. For averaging and map calculations the unpartitioned data set was 
used. Twenty-five cycles of refinement using the two 'best monomers cores' positioned in the asymmetric unit as rigid 
bodies and data from 8.0 - 3.0A. resulted in an R,^ of 53.5% for this resolution range. The atomic coordinates of this 
partial model were used to calculate a new 2m|F obI | - DIF^I SigmaA weighted map which we called the 'best monomer 
map*. 

40 Averaging: Search for Missing Density-: The phasing power from the rigid body refined 'best monomer 

cores', consisting of 294 residues per core was insufficient to bring back interpretable electron density for the missing 
pan of the model. 158 residues per monomer. To overcome this a 'bootstrapping* procedure was applied, entailinc 
density averaging using RAVE (Kleywegt & Jones, 1994a) and model expansion. The 'best monomer map* and the rieid 
body refined best monomer cores' served as the starting point for this procedure. 
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Six bootstrapping cycles were carried out, called bmcl through bmc6. allowing for the model to be extended 
in stepwise increments. Figure 16 shows a scheme of the steps incorporated in one bootstrapping cycle. After a cycle 
in which the model had undergone major expansion, a new molecular mask was calculated with MAMA (Kleywegt & 
Jones, 1994b) for use in the subsequent bootstrapping cycle. No phase recombination was applied berween 
5 bootstrapping cycles. At the end of each cycle the inverted phases a % „ and inverted amplitudes F Wh *s were discarded. 
The NCS operator was re-optimized after cycle bmc3. The resolution range of the data included in the bootstrapping 
cycle started with 1 5 - 3.0 A for bmc 1 and was gradually extended to 1 5 - 2.7 A in bmc6. The bootstrapping procedure 
is summarized in Table 2. To optimize the bootstrapping procedure, consideration was given to the molecular mask used 
in the averaging, the model building strategy and the refinement procedure. 
10 Molecular masks: Four different masks were constructed in total. The atomic radius of all atoms was set to 

4A to calculate each mask. The masks were then manually modified using mask editing options in O (Jones et ai 1 99 1 ) 
Mask), was constructed around the 'best monomer core*. Subsequently it was greatly enlarged by multiple blocks of 
10 - 15 A 3 in the regions where the model was incomplete (Figure 17). This was crucial to prevent the density in the 
insertion area's from being flattened during the averaging step. Approximately one half of the dimer interface was 
15 estimated to be formed by regions from the missing cap domain. Major expansions of the mask in this area were made 
to accommodate for this. This resulted in a serious overlap problem when the mask was duplicated to cover a complete 
dimer. The mask was reduced where overlap occurred with the "overlaptrim" option of MAMA. After several 
bootstrapping cycles, new incorporated polypeptide fragments were carefully assigned to one of the two monomers 
forming the dimer and the mask at the dimer interface area's was manually adjusted accordingly. Essentially the masks 

20 were kept far too large in regions where the model was missing in order to avoid erroneous flattening of electron density. 
In contrast the masks were tightened around the area's of the molecule where the model was complete. 

Model Building: A conservative model building strategy was adopted. Initially only side chains were mutated 
in the core region to fit the PPCA amino acid sequence and where the density was clear, poly-alanine fragments were 
built in the insertion area's (loops and the cap domain). Newly included atoms were given a B- factor of 20 A 3 . Only 

25 once models bmc5 and bmc6 were obtained, was the electron density of sufficient quality to allow side chains to be 
incorporated confidently in the cap domain (residues 190 - 303). At this stage the C m trace was virtually complete for 
the whole dimer and the sequence could be fit unambiguously. 

Refinement: Positional refinement was postponed until after 3 cycles of bootstrapping resulting in a final 
model containing 91% of the C* atoms. Forty steps of positional refinement were then carried out to improve the 

30 geometry of the model. Subsequently only one of the refined monomer was taken and the other generated using NCS 
operators. The rational for delaying the positional refinement is addressed in the discussion. 

Completing the model: deviations from two-fold symmetry. It was possible to add 148 residues and 1 85 side 
chains per monomer after a total of 6 bootstrapping cycles. At this stage, each subunit contained 442 residues and 4 1 3 
side chains, i.e. 98% of the C* and 91% of the side chains atoms. The gradual model expansion as a function of the 

35 bootstrapping cycle is shown in Figure 18. 

Twenty residues were still missing in the asymmetric unit at this stage. These were localized to two stretches 
per monomer (260 - 262 and 287-292). With most of the scattering mass incorporated, the monomers from model bmc6 
was refined individually with X-PLOR (Briinger, 1992a) in an attempt to retrieve electron densitv for the still missino 
residues. After 40 steps of positional refinement using data from 8 .0 - 2 6 A. the R,.^ dropped significantly from 40.2% 

40 to 33.2%. The model was funher positionally refined using a full weight W\ on the crystallographic term. The data 
included in the refinement was gradually extended to 2.2 A. At 2 4 A resolution individual B-factors were refined and 
the distribution checked as a function of atom location (i.e.. low B-faciors in the core and high B-factors on the surface). 
Cycles of refinement and refining allowed for 18 missing residues to be added. Essentially almost the complete cap 
domain was retrieved using the bootstrapping procedure, as shown in Figure 19. It became apparent from the refined 

45 maps that the rwo stretches of missing amino acids adopted a very different conformation in the two monomers (wiih 
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as much as an average r.m.s.d. of 7.9 A for the C'*s of residues 287 - 292). For this reason electron density for these 
regions had not been retrieved in the rwo-fold averaging process. The stepwise improvement of the electron density 
maps along with averaging, model expansion and refinement is shown in Figure 6. 

The program ARP was used to check our model, in particular the region at the dimer interface (Lamzin & 
Wilson. 1993). Prior to the final round of positional refinement, an IF^I/o cutoff was applied to reject 10% of the 
weakest data as well as an anisotropic scale factor to offset the decreased resolution along the crystal lographic a axis. 
The final model is of good geometry with a final of 2 1 .3% (R frw of 26.8 %) for data between 8.0 and 2.2 A (see 
Table 3). A Ramachandran plot is given in Figure 21. The r.m.s. coordinate error is 0.282 as calculated by SigmaA 
(Read. 1986). The average phase difference between the initial molecular replacement model and the currently refined 
model is calculated to be 71° for data between 10-2.2 A. 

The structure determination of PPCA is special in that two- fold averaging could be applied to refine very poor 
molecular replacement phases, enabling us to retrieve electron density for 148 residues and 185 side chains per 
monomer. In total 314 complete residues were added per asymmetric unit, equivalent to about 35 kDa of protein. In 
retrospect we feel that a number of factors contributed to a successful structure determination. 

Crystal Packing. Each monomer in the crystal is interacting with four non-crystal lographically related 
monomers. By far the most extensive contact is with a non-crystallographically related monomer generating the 
physiological dimer. Three additional contacts are extensive crystal contacts ranging from 200-800 A : averaged per 
monomer. The largest nondimer crystal contact involves the precursor loops from two crystal lographically independent 
monomers ( region 265-267. 281-295 from monomer 1 with residues 281-293 from monomer 2) making intimate contact 
with each other. Summed together these loops create an intermolecular buried surface of 1680 A 2 . We believe that this 
stabilizes an otherwise very flexible area, possibly explaining the good diffraction qualities of the P2,2,2 crystals. 

It is also in this crystal contact that we find deviating spacial conformation and secondary structure between 
the two monomers as mentioned before. The electron density in this region is of very good quality with average 
temperature factors of 16.6 A : for main chain and 18.3 A 2 for side chains. 

pPPCA and the Hydrolase Family. The fold of pPPCA belongs to the large hydrolase fold family containing 
enzymes such as the serine carboxypeptidases, dehalogenase, various lipases and acetylcholine esterase (Ollis et ai 
(1992). infra), having various different catalytic functions. Though the central core is the same (a central p-sheet flanked 
by a-helices on both sides) the proteins in this family all seem to have different 'cap' domains, both with respect to fold 
as well as size (Figure 7A-F). pPPCA has one of the largest cap domains comprising 121 residues forming the three 
helical bundle of the helical subdomain and a three stranded p-sheet of the maturation subdomain. 

Major Differences and Comparison With the Serine Carboxypeptidases. The overall fold of the pPPCA 
monomer is similar to that of the wheat and yeast serine carboxypeptidases (Endrizzi et ai (1994), infra; Ollis et ai 
(1992), infra). The complete core domains of pPPCA and CPW superimpose with an r.m.s. deviation of 1 .7 A for 302 
Ca atoms and 38% sequence identity. Deleting major deviating loops from the core domain allows for pPPCA to 
superimpose with an r.m.s. deviation of 1.2 A onto CPW and CPY (293 equivalent Cs with 40 % sequence identity for 
CPW/pPPCA and 271 equivalent C s 's forCPY/pPPCA with 42.2% identity). 

The cap domain in pPPCA differs significantly from the CPW and CPY counterparts. The pPPCA structure 
reveals a large maturation subdomain not present in the structure of CPW and CPY for which the structures of the 
enzymatically active forms are known. All three enzymes contain a 3 helical bundle in the cap domain. The sequence 
identity between the three proteins in this region is very low tea. 12 %V In contrast. PPCA shows a much greater 
deviation. Hal superimposes reasonably well with the CPW counterpart maintaining the same genera) orientation with 
respect to the core domain (requiring a rotation of only 7 .4°). But helices Ha2 and Ha3 have undergone major rotations 
with respect to Hal and the core domains by k = 28.5° and k = 93 4°. respectively (Figure 8A). 

Due to the integral role of the cap domain in forming the dimer interface, the dimers of PPCA and CPW were 
• mpared. In the pPPCA and CPW dimers the monomers are oriented differently with respect to each other. 
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Superposition of the core domain of one monomer from each dimer shows that the second pair of monomers ( forming 
the respective dimers) differ by a remarkable 15° in orientation (Figure 8B). Thus, it appears that the extensive 
differences in the cap domains lead to a different arrangement of the su bun its in the dimers of PPCA and CPW. 

Catalytic Triad and Enzymatic Mechanism. Our structure shows that the precursor PPCA has all the elements 
5 proposed for the enzymatic machinery of the serine carboxypeptidase family (Liao et at. ( 1 992). infra: Endrizzi ei al. 
(1994), infra), and is now discovered to be the third structure elucidated belonging to this family of enzymes after CPW 
and CPY. The catalytic triad in the active site of pPPCA is formed by residues Ser 1 50. His 429 and Asp 372. The O t 
of Ser 150 forms a good hydrogen bond with the N'l of His 429 with a N to O distance of 2.8 A. The N*l of His 429 
is 2.7 A removed from the 0 6 2 and 3 .3 A from the 0*1 of Asp 372. Further, two backbone amides appear to orient the 
10 carboxylate group of Asp 372. The N of Ala 374 is at a distance of 3.0 A to the O*' of Asp 372 and the N of Cvs 375 
is at a distance of 2.9 A to the 0 63 of Asp 372. 

The oxyanion hole proposed to stabilize the negatively charged tetrahedral intermediate in serine 
carboxypeptidases is formed by the backbone amides of Gly 57 and Tyr 1 5 1 in PPCA. The 32 atoms of the catalytic 
triad residues plus the oxyanion hole amides from PPCA, CPY and CPW superimpose with an r.m.s. deviation of 0.4 
15 A indicating the very high degree of strucTural similarity of the active site in the PPCA precursor with those in the fully 
active enzymes CPY and CPW, (see Table 4). The carboxylate of Asp 372 and the imidazole of His 429 in PPCA are 
non-planar, making an angle of approximately 60° between the imidazole and the carboxylate. A similar non-planariry 
has been observed in CPW and CPY, in contrast to the planar orientation found in subtiiisin-.and trypsin-type serine 
proteases (McPhalen et a/.. Biochemistry 27:6582-6598 (1988)). 
20 In pPPCA, a pair of glutamic acid residues (Glu 69 and Glu 149) is positioned near the catalytic triad, with their 

carboxylate groups interacting with each other. The carboxylate groups are located at approximately 8 A from the 0 T 
of Ser 150, and lie at the bottom of the active site. An asparagine (Asn 55) is orientated such that it forms a hydrogen 
bond to each of the two carboxylate groups of the glutamic acid pair, at an N M (Asn) to C'/O* 2 (Glu) distance of 3.0 and 
3.6 A, respectively. In addition the two carboxylates interact with each other via hydrogen bonds. This configuration 
25 of two glutamic acid residues and an asparagine, is conserved between pPPCA, CPW and CPY (see Table 4), and has 
been implicated in regulating the low pH optimum for the carboxypeptidase activity found in the serine 
carboxypeptidases (Liao et al (1992), infra). Biochemical data has suggested that a functional group with an apparent 
pK, value of pH 5 .5, functions to bind the C-terminal carboxylate group of peptide substrates and is responsible for the 
observed pH optimum of 5.5 (reviewed in Breddam et at. (1986), infra: Rawlings & Barren ( 1994), infra). Together 
with their structural data. Liao and colleagues (Liao et al. (1992). infra) have suggested that at pH 5.5 or below, one or 
both giutamates must be uncharged, while at a pH higher than 5.5 one or both of the carboxylates which are orientated 
opposite to each other, may become deprotonated resulting in unfavorable electrostatic interactions. This would disturb 
the hydrogen bonding pattern or result in structural perturbations causing the observed increase in K m for peptide 
substrates at high pH. In pPPCA the orientation of this pair of glutamic acids as well as that of the asparagine is 
35 essentially identical in structure to the equivalent residues in CPW and CPY (see Table 4), even though the structure has 
been determined at pH 8. The CPW and CPY structures have been determined at pH 5.7 and at pH 6.5-7.0. Thus, our 
structure appears to rule out large pH induced conformational changes of these three residues at least up to a pH value 
2.5 units above that optimal for carboxypeptidase activity. However the high degree of conservation of these residues 
does indicate some role in a characteristic shared bv all three enzvmes. 
40 From our comparison it is clear that the enzymatic machinery in the PPCA precursor form is in a conformation 

virtually identical to that found in the fully active CPW and CPY enzymes. On this basis, the conformation of the 
enzymatic machinery found in pPPCA is expected to faithfully represent the conformation that will be found in the 
active PPCA. 
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ActiveSite* Substrate Specificity. PPCA has a substrate preference for hydrophobic residues in the PI and/or 
PI' binding pockets (Jackman et at.. Hypertension 27:925-928 (1993)). In CPW the PI' pocket was identified to consist 
of two tyrosine residues (Tyr 60 and Tyr 239) which form a long channel, capped by two acidic residues (Glu 272 and 
Glu 398) at the end (Liao et at. (1992), infra). This explains the highest preference of this enzyme for Arg and Lys as 
5 the leaving group (Breddam et ai. Carisberg Res. Commun. 52:297-3 1 1 (1987)). In CPY a similarly shaped pocket is 
formed by the residues Thr 60, Tyr 256, Leu 272 and Met 398 (Endrizzi et al. (1994), infra). In PPCA the analogous 
residues are Tyr 247 and Asp 64. forming the sides of the pocket with at the far end Met 430 and Thr 304. This is 
reasonably consistent with an overall preference of PPCA for a hydrophobic leaving group. 

In activation Mechanism of the Precursor Form. During the maturation step of the PPCA precursor form, at 
10 maximum residues 285-298 forming the 'excision* peptide, are removed by an as yet unidentified protease(s). In vitro, 
the maturation event can be mimicked by digestion with trypsin utilizing probably positions Arg 284. as well as Arg 292 
and/or Arg 298. The residues forming the 'excision' peptide adopt distinctly different conformations in the two 
crystallographically distinct monomers forming the PPCA dimer in our crystal structure. Yet in both monomers this 
polypeptide region extends out from the protein surface and is virtually completely solvent and protease accessible 
15 (Figure 9). Arg 284 and Arg 292 are particularly well exposed. The main chain atoms of Arg 298 are less accessible, 
being sandwiched between the strand MP2 and a loop N-terminal to helix Ca6. while a salt bridge with Glu 264 renders 
the side chain atoms of Arg 298 partially solvent inaccessible. 

The active site cleft is blocked by numerous residues from the maturation subdomain in the precursor form of 
PPCA. The catalytic triad is rendered solvent inaccessible by residues Asn 275, He 276 and Phe 277. These residues 
20 are pan of the polypeptide Asp 272-Phe 277 which we call the 'blocking* peptide. This peptide is held down 
predominantly by hydrophobic contacts of Leu 273, lie 276. and Phe 277 to the core domain residues Gly 57, Cys 60, 
Leu 180, Leu 190, Val 191, Leu 232, Val 235, lie 246, Leu 280, Leu 282, Met 299 and Ala 373 (Fig 10). In addition 
residue Asn 275 of the blocking peptide appears to fill what might be part of the PI binding pocket in the mature form. 
Further inspection of the blocking peptide suggests that Gly 274 with Ramachandran angles 4> = 66° and 4> - 28°, might 
25 play a central role in the strand blocking the active site. A glycine at this position appears critical to allow the 
polypeptide chain to adopt a conformation with its main chain at a safe distance from the catalytic triad. This might aid 
in allowing the blocking peptide to assume a conformation resistant to autocatalysis. The PI ' binding pocket seems to 
be beautifully filled by Pro 301 interacting with Thr 304, Tyr 247, Cys 60 and Cys 334. Thus substrate binding is not 
possible in the precursor form due to the inaccessibility of the substrate binding pockets. 
30 We conclude that the inactivation mechanism of PPCA is based on blocking of the active site, and not upon 

changes in the position of functional groups involved in catalysis/transition state stabilization. Both the PI, P2 and PI' 
binding pockets are rendered solvent inaccessible. The function of the blocking peptide seems to be to render the 
catalytic triad as well as the region around the PI and P2 binding pockets solvent inaccessible. The blocking peptide, 
however, does not assume a conformation that a peptide substrate would adopt. It is carefully positioned in a manner 
35 which is different from that of a productive substrate, thereby avoiding being by the nearby catalytic residues which 
are correctly poised for catalysis. A crucial observation is that the excision peptide itself does not bind in the active site 
cleft. Hence, mere removal of the excision peptide alone is not sufficient to allow solvent or substrate access to the 
active site. 

Proposed Maturation Event and Extent of Conformational Rearrangement. The active site of the precursor 
40 of PPCA appears to be fully blocked by 49 residues of the maturation subdomain. as shown in Figure 1 1 . Based on the 
precursor structure and the comparison with CPW and CPY it is proposed that a region comprising approximately 
residues 254-284 rearranges to free the PI. P2 bindine sites, while the residues 299-302 rearrange to free the PI* binding 
pocket. The linker connecting these two segments of polypeptide chain is the 14 amino acid excision peptide Met 285- 
Arg 298. The extent of the residues rearranging is likely to be limited by a disulfide bridge Cys 253 and Cys 303, which 
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is conserved in the serine carboxypepiidase family. This critical disulfide serves to keep the secondary structure 
elements together at the far end of the PI' pocket. 

An interesting pair of salt bridges is observed between Arg 262. Asp 300. Glu 264 and Arg 298. four residues 
located on strands Mpl and Mp3 of the mixed J*-sheet found in the maturation subdomain. This cluster of residues is 
5 strategically positioned at the base of the excision peptide, close the core domain and 'shielding* the mixed f}-sheet via 
side chain interactions (see Figure 1 1 ). These residues are strictly conserved among the human, mouse and chicken 
PPCAs (Galjan et ai (1991), infra). This charge cluster may be effected by a shift from neutral to acidic pH. Arrival 
in the endosome/lysosome is expected to result in protonation of either the Asp or the Glu residue or both, resulting in 
unfavorable electrostatic interactions and destabilization of this charge cluster. This in turn is expected to promote partial 

10 unfolding of maturation subdomain. allowing easier access to additional potential cleavage sites, and stimulating removal 
of the 'blocking' peptide which fills the active site in the precursor. 

A similar double salt bridge has been observed in the aspartic proteinase zymogen pepsinogen between the 
proenzyme segment (Arg 8P) and the enzyme (Arg 308, Glu 13. Asp 304). 

The maturation mechanism for pPPCA appears to be novel among proteases for which the three-dimensional 

15 structure of the zymogen is known. The catalytic triad in the precursor form is in a catalytically competent 
conformation. Enzymatic activity is prevented by a 'blocking* peptide. The blocking peptide is however different from 
the excision peptide and does not get excised from the mature enzyme. This leads to the distinct difference with the 
other known maturation mechanisms in that, after disappearance of the excision peptide, up to 35 residues filling the 
active site cleft in the PPCA precursor must rearrange to render the catalytic triad solvent accessible (see Figure 12), but 

20 do not get cleaved off. Removal of the excision peptide, and possibly a shift to lower pH in the endosome/lysosome, 
appears to be a trigger for this event. The mechanism does not appear to be autocatalytic, as uptake experiments with 
cultured galactosialidosis fibroblasts, have shown that a mutant PPCA with the catalytic Ser 150 mutated to Ala, is 
properly targeted and processed. It retains its protective function and except for the loss of catalytic activity is 
biochemically indistinguishable from the wild type enzyme (Galjan et ai ( 1991 ), infra). Surprisingly, the maturation 

25 mechanism of the serine carboxypeptidases PPCA, CPW and CPY may alt differ from each other as well. This is 
clearest for CPY, in which a 91 residue polypeptide is cleaved off N-terminally to convert the zymogen to an active 
enzyme (Winther and Sorensen, Proc. Natl. Acad Sci, USA 55:9330-9334 (1991)), as opposed to the excision of a 
peptide from within the zymogen generating a two chain active form as is the case for PPCA and CPW. 

Looking at the hydrolase fold family, the catalytic triad is housed in the core domain and the various cap 

30 domains attenuate the biological function by influencing entirely different properties such as: (I) enzyme kinetics 
exemplified by the interfacial activation of lipases (Smith etal. Curr. Opinion in Structural Biology 2:490-496 (1992)); 
(ii) substrate channeling as is proposed for acetylcholine esterase (Sussman et ai (1991). infra): (iii) substrate 
recognition, proposed for dehalogenase by (Fran ken et ai. (1991), infra) and for CPY and CPW by (Endrizzi et ai 
(1994), infra): and (iv) enzyme inactivation in the case of PPCA. 

35 Biological Implications. Deficiency of the protective protein/cathepsin A (PPCA) in humans results in the 

lysosomal storage disease galactosialidosis. PPCA is thought to form a multi-enzyme complex with p-galactosidase and 
neuraminidase in the lysosomes protecting the latter glycosidases in their harsh acidic and proteases-rich environment. 
PPCA has a 30% sequence identity to the wheat serine carboxy peptidase (CPW) and yeast serine carboxypepiidase 
(CPY). It has been show that PPCA in the precursor form is inactive, but upon maturation, entailing excision of a 2 kDa 

40 peptide, carboxypeptidase activity is released. 

The precursor structure reveals an inactivation mechanism that has not been seen before in any of the other 
known zymogen structures of proteases (available for the serine-, metal lo- and aspartic protease classes). The catalytic 
triad seems to have an arrangement poised for catalysis. However, the triad is rendered solvent and substrate 
inaccessible by a strand from the maturation subdomain binding in the active site cleft. Surprisingly, this strand called 

45 the 'blocking' peptide does not overlap with the 2 kDa excision' peptide. Hence, after removal of the excision peptide 
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up to 35 additional residues must rearrange in order to unblock the active site cleft. A strategically positioned pair of 
salt bridges, comprising Arg 262, Arg 298. Glu 264. and Asp 300 at the base of the excision peptide, are expected to 
optionally become destabilized at low pH, unraveling this region of the structure, allowing easier access to cleavage sites 
and/or promoting the rearrangement event. 
5 A number of research groups are currently involved in designing enzyme and gene therapy procedures for 

several lysosomal storage diseases. Insight into the three-dimensional structure, protein functioning and stability of 
PPCA. the first enzyme of known structure associated with a lysosomal storage disease and the third human lysosomal 
structure to be determined, may prove useful in future designs of an adequate therapy procedure for galaciosialidosis. 
Information from the three-dimensional structure of PPCA, might also aid in designing an engineered form of PPCA 
10 with increased stability and a longer half-life. 
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Table 1: X-ray Data Collection Statistics 



resolution 
wavelength 
space group 
unit cell 

temperature oi oata collection 
No. of observed reflections 
No. of unique reflections 
completeness of all data 

for all data 
completeness of outer shell 

(2.26-2.20A) 
R,^ in outer shell (2.26-2.20A) 


32.27-2.2 A 
1.08 A 
P2.2.2 

a=l 15.04 b =148.1 1 c=80.97 A 
-178°C 
436,709 
67,740 
95.7% 
5.1% 
87.0% 
13.0% 


R tym =^£l,(h)-<I(h)>/££ I,(h), where I,(h) is the i* observation for reflection h 
and <I(h)> is the weighted mean of all the observations. 
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Table 2: Course of Model Building 



PCT/US96/I7325 







nr. of 


nr. of side 

vlUUJiS 




Rf actor 




CC 


CCf^ 






( i o* A*} 


{statistics using data between 8.0 and 3.0 A} 


5 


*rtnl r^nl /mr^ 

rigid body ref. (rmr) 
calculate NCS matrix 


331 


125 


- 


54.2 
52.6 


55.3 
52.9 


0.243 
0.287 


0.244 
0.318 




best monomer (hm\ 
rigid body ref. 
update NCS matrix 


294 


228 


- 


55 9 
53.5 


57 4 
55.0 


0.320 


vZ 1 O 

0328 




bmcl (mask 1) 


373 


258 


10.8 


49.9 


51.3 


0.403 


0.424 


10 


bmc2 (mask 1) 


405 


277 


10.8 


48.6 


48.4 


0.443 


0.478 




bmc3 (mask 2) 
rigid body ref. 
positional ref. (pbmc3) 
update NCS matrix 


A m m 

411 


307 


st Aft 

9.99 


47.1 
46.9 
39.4 


48.6 
48.4 
44.7 


0.471 
0.476 
0.622 


0.491 
0.492 
0.562 


15 


bmc4 (mask I ) 


412 


327 


10.8 


41.7 


43.) 


0.584 


0.585 




bmc5 (mask 3) 


435 


387 


8.88 


39.8 


40.6 


0.621 


0.623 




bmc6 (mask 4) 


442 


413 


9.11 


38.4 


40.2 


0.647 


0.637 


20 
25 


Summary of the bootstrapping procedure. The resulting models have been listed chronologically starting 
with the molecular replacement solution, i.e. mr (molecular replacement), bm (best monomer core), and 
the bootstrapping cycles bmcl through bmc6. The following statistics are given for the various models: 
the number of C* atoms built per monomer; the number of correct side chains incorporated per monomer 
and the volume of the molecular mask used during the averaging if applicable. The quality of each mode! 

is assessed using the Rr, CC and CC frw calculated by X-PLOR for data between 8.0 and 3.0 A. 

After positional refinement of model bmc3. both monomers were made equivalent by taking one monomer 
and generating the non-crystal lographically related one. 
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Table 3: Current Status of the Model 





statistics Tor the data used in refinement: 








resolution (A) 


Rfactor (%) 


completeness (%) 




8.0 - 4.3 


22.4 


85.7 


5 


4.3-3.5 


19.0 


89.1 




3.5-3.0 


20.6 


89.1 




3.0-2.8 


21.3 


87.9 




2.8-2.6 


22.3 


86.1 




2.6-2.4 


22.2 


84.0 


10 


2.4-2.3 


7? 7 


o 1 - J 




2.3 - 2.2 


24.0 


78.3 




8.0-2.2. A 


21 3% 






■ * 
















molecules in the asymmetric unit: 




2 


15 


residues (out of 904 possible): 




902 




sugars: 




6 




waters: 




296 




r.m.s.d. bond length (A): 




0.012 




r.m.s.d. bond angles (°): 




1.72 


20 


average B-values for main chain atoms (A 2 ): 




16.6 




side chain atoms (A 2 ): 




18.3 
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Table 4 



Superposition of the proposed catalytic machinery of the serine car boxy peptidases with known three- 
dimensional structure PPCA, CPW (Liao et al. t Biochemistry* J/:9796-9812 (1992)) and CPY (Endrizzi 
et aL> Biochemistry 33: 1 1 106-1 1 120 (1994)). 



10 



15 



20 



PPCA 




CPW 




arrCA-Lrw 


CP l 




ArrCA*Crn 


Catalytic tnad: 


N 


5er !4o 


VI 

N 


/A) 




V* 
IN 


I'M 


Scr 150 


c* 




c* 


Q.J 




L 


ft A 

0.4 




c 


His 397 




ft A 


u: t *SO"7 
nis J" ' 




U.J 


HIS 429 






w 






o 


0 4 




L. 




v. 








0 4 






ASp J Jo 


rvr 


0 9 


Asn 338 




1 1 


Asp 372 


N 




V] 


I < 

1 .J 




N 


n q 






P« 


0 2 




r* 


0 4 










0 1 




r 


U.*1 




L> 




r\ 


ft 1 

U.J 




n 


ft 4 

U.J 








c» 


0.5 






0.6 




O* 




o t 


0.3 




O t 


0.6 




c« 




c 41 


0.3 




C" 


0.5 




c«. 




C«i 


07 




CT 


0.5 




N*' 




N»' 


0.4 




N" 


0.5 




No 




N«: 


0.3 




N* 1 


0.4 




N 




N 


0.7 




N 


0.5 




C 




C« 


0.2 




C* 


0.2 




C 




c 


0.1 




C 


0.1 




o 




o 


0.J 




0 


0.1 




o 




c 


0.2 




c» 


0.1 




a 






0.3 




c 


0.2 




O*' 




O*' 


0.2 




o*> 


0.1 








o" 


0.2 
0.4 




o* 1 


0.3 
0.1 



Proposed oxvanion hole (forme d bv two frac^hone amides): 



Gly 57 
Tyr 151 



N 

C 

C 

O 

N 

C* 

c 
o 



Gly 53 
Tyr 147 



N 
C* 

C 

o 

N 

c* 

c 
o 



0.1 
0.2 
0.1 
0.3 
0.3 
0.2 
0.3 
0.5 



Gly 53 
Tyr 147 



N 

C 

C 

0 

N 

e* 

c 
o 



0.5 
0.4 
0.4 
0.8 
0.2 
0.1 
0.2 
0.2 



proposed regulation of pH dependent pep tidase activity 



Asn55 
Glu 69 
Glu 149 



averaged over all atoms Asn 5 1 
averaged over all atoms Glu 65 
averaged over all atoms Glu 145 



0.2 
0.3 
0.4 



Asn 51 
Glu 65 
Glu 145 



0.2 
0.7 
0.4 



The residues forming the proposed catalytic machinery are strictly conserved between the three serine 
carboxypeptidases. The deviation in distance between the atoms from PPCA and the equivalent atoms in 
CPW or CPY after superposition is given in Angstrom. 
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What Is Claimed Is: 

1. A method for crystallizing a human protective protein/cathepsin A (PPCA) or 
precursor human protective/cathepsin A protein (pPPCA). comprising 

(a) providing a purified PPCA or pPPCA; 
5 (b) crystallizing the purified PPCA or pPPCA using a hanging drop or diffusion 

method, to provide crystallized PPCA or pPPCA having biological activity, 

wherein the crystallized PPCA or pPPCA is resolvable using x-ray crystallography to obtain 
x-ray diffraction patterns suitable for three-dimensional structure determination of the PPCA or 
pPPCA. 

10 2. A method according to claim 1, wherein said PPCA or pPPCA has at least one 

biological activity selected from the group consisting of enzyme protecting activity, enzyme 
modulating activity and peptide hydrolyzing activity. 

3. A method according to claim 1, wherein said crystallization step is done under 
conditions of purified PPCA or pPPCA; 2-30% PEG400- 10.000; precipitating salt; buffers, and pH 

15 7-9. 

4. A method according to claim 3, wherein the crystallization conditions are PPCA or 
pPPCA; 5-14% PEG8000, 40-80 mM tromethamine, 0.05-2.0 mM NaN 3 and pH 8.0-8.3. 

5. A crystallized PPCA or pPPCA, or at least one subdomain thereof, provided by a 
method according to claim 1. 

20 6. A method for providing an atomic model of a PPCA or pPPCA, comprising 

(a) providing a computer readable medium having stored thereon atomic 
coordinate/x-ray diffraction data of said PPCA or pPPCA in crystalline form, said data sufficient to 
model the three-dimensional structure of said PPCA, said pPPC A, or at least one subdomain thereof; 

(b) analyzing, on a computer using at least one subroutine executed in said computer, 
25 the atomic coordinate/x-ray diffraction data from (a) to provide data output defining an atomic model 

of said PPCA or said pPPCA, said analyzing utilizing at least one computing algorithm selected from 
the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity 
merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, electron map 
30 visualization, model building, rigid body refinement, positional refinement: and 

(c) obtaining atomic model output data defining the three-dimensional structure of 
said PPCA, pPPCA or at least one subdomain thereof. 

7. A method according to claim 6. wherein said computer readable medium further has 
stored thereon data corresponding to a nucleic acid sequence or an amino acid sequence data 
35 comprising at least one structural domain or a functional domain of a PPCA or pPPCA 
corresponding to a portion of the amino acid sequences of Figures 13 or 14. and wherein said 
analyzing step further comprises analyzing said sequence data. 
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8. A computer readable medium having stored thereon atomic model data of said PPCA 
or pPPCA as the model output data produced by a method according to claim 6. 

9. A computer-based system for providing atomic model data of the three dimensional 
structure of a PPCA or a pPPCA. comprising the following elements; 

5 (a) a computer readable medium having stored thereon atomic coordinate/x-ray 

diffraction data of said PPCA or pPPCA or at least one subdomain thereof; 

(b) at least one computing subroutine, that when executed in a computer, causes the 
computer to analyze the atomic coordinate/x-ray diffraction data from (a) to provide data output 
defining an atomic model of said PPCA or pPPCA. said analyzing utilizing at least one computing 

10 subroutine selected from the group consisting of data processing and reduction, auto-indexing, 
intensity scaling, intensity merging, amplitude conversion, truncation, molecular replacement, 
molecular alignment, molecular refinement, electron density map calculation, electron density 
modification, electron map visualization, model building, rigid body refinement, positional 
refinement: and 

15 (c) retrieval means for obtaining atomic model output data defining the three- 

dimensional structure of said PPCA, pPPCA or at least one subdomain thereof. 

10. A computer-based system according to claim 9, wherein said computer readable 
medium further has stored thereon data corresponding to a nucleic acid sequence or an amino acid 
sequence data comprising at least one structural domain or a functional domain of a PPCA or 

20 pPPC A corresponding to a portion of the amino acid sequences of Figures 1 3 or 1 4, and wherein said 
at least one subroutine further includes analyzing said sequence data. 

11. A computer readable medium, having stored thereon atomic model data of a PPCA, 
pPPCA, or at least one subdomain thereof, produced by a computer system according to claim 9. 

12. A method for providing an computer atomic model of a ligand of a PPCA or pPPCA, 
25 comprising 

(a) providing a computer readable medium according to claim 1 1 , having stored 
thereon atomic model data of a PPCA, a pPPCA or at least one subdomain thereof; 

(b) providing a computer readable medium having stored thereon atomic model data 
sufficient to generate atomic models of potential ligands of PPCA or pPPCA; 

30 (c) analyzing on a computer, using at least one subroutine executed in said computer, 

the atomic model data from (a) and the ligand data from (b), to determine binding sites of PPCA or 
pPPCA and to provide data output defining an atomic model of a ligand of said PPCA, pPPCA, or 
at least one subdomain thereof, said analyzing utilizing computing subroutines selected from the 
group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity 

35 merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, electron map 
visualization, model building, rigid body refinement, positional refinement; and 
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(d) obtaining atomic model output data defining the three-dimensional structure of 
a ligand of said PPCA, pPPCA or at least one subdomain thereof. 

13. A computer readable medium having stored thereon the model output data produced 
by a method according to claim 12. 

14. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 
atomic model of the ligand model pasducedby a method according to claim 12. 

15. A computer-based system for providing an atomic model of a ligand of a PPCA or 
pPPCA. comprising the following elements; 

(a) a computer readable medium having stored thereon atomic model data of a PPCA 

or pPPCA; 

(b) a computer readable medium having stored thereon atomic model data sufficient 
to generate atomic models of potential ligands of PPCA or pPPCA; 

(c) at least one computing subroutine for analyzing on a computer the atomic model 
data of PPCA or pPPCA from (a) and the ligand data from (b), to determine binding sites of PPCA 
or pPPC A and to provide data output defining a atomic models of potential ligands of PPCA or 
pPPCA, said analyzing utilizing at least one computing subroutine selected from the group consisting 
of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude 
conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron 
density map calculation, election density modification, electron map visualization, model building, 
rigid body refinement, positional refinement; and 

(d) retrieval means for obtaining atomic model output data defining the atomic 
models of potential ligands of PPCA or pPPCA. 

16. A computer readable medium, comprising atomic model output data of a potential 
ligand of PPCA or pPPCA, said data produced by a method according to claim 15. 

17. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 
atomic model of a ligand produced by a computer system according to claim 15. 

1 8. A crystallized pPPCA, having the atomic coordinates presented in Figure 23 . 1 -23.4 1 . 
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Protective Protein/Cathepsin A and Precursor: Crystallization, X-Ray Diffraction, Three- 

Dimensional Structure Determination and Rational Drug Design 

Background of the Invention 

5 Statement as to Rights to inventions Made Under 
Federally-Sponsored Research and Development 

Pan of the work performed during development of this invention utilized U.S. Government funds. The U.S. 
Government has certain rights in this invention. 
Field of the Invention 

10 The present invention is in the fields of molecular biology, protein purification, protein crystallization, x-ray 

diffraction analysis, three-dimensional structure determination and rational drug design (RDD). The present invention 
provides crystallized protective protein/cathepsin A (PPCA) and its precursor (pPPCA). The crystallized PPCA or 
pPPCA is analyzed by x-ray diffraction techniques. The resulting x-ray diffraction panems are of sufficiently high 
resolution to be useful for determining the three-dimensional structure of the PPCA or pPPCA protein, and for RDD. 

1 5 Related Background A rt 

The human protective protein/cathepsin A (PPCA, also known as human protective protein or HPP) has been 
identified as the primary genetic defect underlying galactosialidosis (d'Azzo et ai % Proc. Natl. Acad. Set. U.S.A 7P:4535- 
4539 (1982)), a lysosomal storage disease inherited as an autosomal recessive trait. Patients with this disorder are 
diagnosed as having drastically reduced P-galactosidase and neuraminidase activities in their cell lysosomes. Examples 

20 of lysosomal storage diseases are presented in Table 316-1 of Braunwald et aL, eds.. Harrison s Principles of Interna/ 
Medicine. 1 1th Ed., pp. 1661-1671. McGraw Hill Book Co., New York (1987); as well as Wenger et aL, Biochem. 
Biophys. Res Commun. £2:589-595 (1978); Tenamanti et aL eds.. Sialydases and Siaiidosis. Perspectives in Inherited 
Metabolic Diseases, Vol. 4, Edi. Ermes, Miiano (1981), pp. 261-279 and 379-395; and van Diggelen et aL Lancet 
2:804(1987), which references are entirely incorporated herein by reference.. 

25 Researchers have proposed that one of PPCA's functions is to stabilize p-galactosidase and neuraminidase in 

a multi-enzyme complex, which complex is deficient in galactosialidosis patients (d'Azzo et aL ( 1 982A infra: Hoogeveen 
et aL (1983/ infra). Evidence for this protective function comes from studies showing that PPCA is taken up from the 
culture medium by galactosialidosis fibroblasts and that PPCA restores both p-galactosidase and neuraminidase activities 
to these fibroblasts (d'Azzo et aL ( I982>, infra). 

30 Tnc cDNA for pp CA directs the synthesis of a 452 amino acid precursor PPCA (pPPCA) (Figure 13) with a 

molecular weight of 54 kDa (Galjan et aL. Ceil. 54:755-764 (1988)). The amino acid sequences of PPCA (Figure 14) 
and pPPCA (Figure 13) contain two glycosylation sites (Asn 1 17 and Asn 305). both of which are glycosylated in 
cultured fibroblasts and cells over-expressing PPCA or pPPCA. pPPCA dimerizes soon after synthesis in the 
endoplasmic reticulum (ER) (Zhou et aL. EMBOJ. 70:404-4048(1991)). 
35 Lysosomal PPCA has cathepsin A/deamidase/esterase activities which are exerted in vitro on a specific subset 

of bioactive peptides. Non-limiting examples of those hydrolyzed by PPCA are: substance P and substance P-free acid: 
oxytocin and oxytocin-free acid; neurokinin A; angiotensin I; bradykinin (Jackman infra. (1990). Furthermore, the 
enzyme inactivates endothelin I activity in rat smooth muscle cells and normal human tissues. This activity was deficient 
in liver from a galactosialidosis patient (Itoh. infra. 1995; Jackman et aL. J. Biol Chem. 2(57.2872-2875, (1992) 

Endothelins (ET-1. ET-2 and ET-3) are potent vasoconstrictors and elevate blood pressure in mammals. They 
also intluence cell proliferation and hormone production and have been implicated in cardiovascular disorders, ranginn 
from hypertension to stroke to ischemic heart disease (Rubanyi and PolokofT. Pharmc.Rev 46:325-4 1 5 (1994)). 

The three-dimensional structure of a PPCA or a pPPCA has not previously been published, which structure 
could delineate specific biological activities and ligands as therapeutics for PPCA-related pathologies. Accordingly, 
there is a need to provide three-dimensional structures of at least one PPCA, pPPCA or ligands for diagnosis or therapy 
of PPCA-related pathologies. 
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Summary of the Invention 
The present invention provides methods of expressing, purifying and crystallizing a human protective 
protein/cathepsin A (PPCA) and its precursor, precursor protective protein/cathepsin A (pPPCA). The present invention 
also prov.des methods for obtaining crystallized PPCA or pPPCA that can be analyzed to obtain x- ray diffraction 
5 patterns of sufficiently high resolution to be useful for three-dimensional structure determination of the protein 

The x-ray diffraction patterns can be either analyzed directly ,o provide the three dimensional structure (if of 
sufficiently by h.gh resolution), or atomic coordinates for the crystallized PPCA or pPPCA. as provided herein can be 
used for structure determination. The x-ray pattem/diffraction patterns obtained by methods of the present invention 
and provided on computer readable media, are used to provide electron density maps. The amino acid sequence is also 
10 useful for three-dimensional structure determination. The data is then used in combination with phase determination 
(e g., using mult.ple isomorphous replacement (MIR) molecular replacement techniques) to generate electron densi.v 
maps of a PPCA or a pPPCA. using a suitable computer system. 

The electron density maps, provided by analysis of either the x-ray diffraction patterns or working backwards 
from the atom,c coordinates, provided herein, are then fined using suitable computer algorithms to generate secondary 
ternary and/or quaternary domains of a PPCA or a pPPCA. which domains are then used to provide an overall three- 
d.mens.onal structure, as well as expected binding and active sites of the PPCA or pPPCA. pPPCA h « some of the 
aenve and bmding sites of PPCA . except for changes in structure due to the presence of the portion of the pPPCA which 
is deleted during maturation to PPCA (e.g.. residues 285-298 of Figure 13). 

Structure determination methods and computer systems are also provided by the present invention for rational 
drug des.gn (RDD). These ROD methods use computer modeling programs to find potential ligands that are calculated 
to assoc.ate with, or bind to. sites or domains of a PPCA or a pPPCA. Potential ligands are then screened for modulating 
or bmd.ng activry. Such screening methods can be selected from assays for at least one PPCA-specific structural feature 
or b,olog.cal activity, preferably as associated with a PPCA- or pPPCA-related pathology, e.g.. protective activity (e g. 
modulatton of (J-galactosidase activity and neuraminidase (N A) activity); and peptide or enzvme modulating activity 
(e g., of endothehn I (serine carboxypeptidase), neuropeptides, cathepsin A. and the like), according to known assays 
The resultmg hgands provided by methods of the present invention are synthesized and are useful for treating, inhibiting 
or preventing at least one of PPCA related pathology in a mammal. 

Other objects of the invention will be apparent to one of ordinary skill in the art from the following detailed 
description and examples relating to the present invention. 

Brief Description of the Figures 
Figure I: is a schematic ribbon diagram of the PPCA monomer (monomer I), where Secondary structure 
ass.gnments are according to DSSP (Kabsch and Sander. Biopotymers 22:2577-2637 (1983)). The 'core* domain is 
shown in yellow. The 'cap' domain consists of a 'helical' subdomain, in red, and a maturation' subdomain in orange 
The catalytic triad Ser 150. His429 and Asp 372 (from right to left) is shown by small green spheres. (Figure generated 
using MOLSCRIPT (Kraulis, J. Appl. Cryst. 24:946-950 ( 1 99 1 ))). 

Figure 2 is stereo diagram is presented of the C\ trace of the PPCA monomer I with numbering of selected 
residues. The residues forming the o-helices and P-strands are as follows accordine to DSSP: 

Core domain: Cpi (21-27); C|»2(32-39); Cp3(50-54): C«l(63-67) Cp4(73-75); Cp5(82-84)- CP6(94-98) 
Ca2( 1 18-135); Cp 7 ( 1 44-149); Ca3( 1 52- 1 63): CP8( 1 7 1 - 1 77): C«4(307-3 13): Ca5(3 1 6-32 1 ); Ca6(336-34 1 )• Ca7(350 ' 
359); CP9(363-369): C«8(377-386); CP 1 0(39 1-401 ); cpi 1 (407-4 16): Cpl 2(4 1 9-424); Ca9(43 1 -434); Ca 1 0(436-447) 
Capdomain: H«l(183-I96); Ha2(202-212): Ha3(226-240): Mpi(261-264); Mp2(267-270): M«l(290-293) 
MP3(296-299). Note that for monomer 2 the secondary structure assignments in the cap domain are slightly different 
than in monomer I. Residues in Hpi are in a region of poor density and Mai is an extended coil. (Figure generated 
using MOLSCRIPT (Kraulis (1991). infra). 
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Figure 3 shows the density for the disulfide bridges Cys 2 1 2-Cys 228 and Cys 2 1 3-Cys 2 1 8 is presented as 
revealed in the SigmaA weighted 2mF 0 -DF c electron density map (Read. Acta Crystallogr. A <*2:140-149 (1986)) 
calculated from the mode) refined to 2.2 A; the map has been contoured at lo. (Figure drawn with the O computer 
program (Jones. Acta Crystallogr. A47\ 110-119 < !99 1 )». 
5 Figure 4 is stereo diagram is presented of the superimposed C* traces from the two crystallographically 

independent PPCA monomers forming the dimer. Monomer \ is in blue, monomer 2 is in red. Residues referred to in 
the text are labeled. Residues 259 and 260 have not been incorporated in the model of monomer 2, since no electron 
density was observed for them. Note the tremendous difference in conformation of the excision peptide located in the 
upper right corner of the proteins. (Figure generated by MOLSCRIPT (Kxaulis (1991). infra)). 

10 Figure 5 is a schematic ribbon diagram is presented of the PPCA dimer viewed approximately along the two- 

fold axis. For monomer 1, the core domain is yellow white the cap domain consists of a helical subdomain in red and 
a maturation subdomain in orange. For monomer 2, the core domain is green, while the cap domain consists of a blue 
helical subdomain and a light blue maturation subdomain. (Figure generated using MOLSCRIPT (Kraulis (1991 ), infra)). 

Figure 6A-B is a representation of the molecular surface of the PPCA dimer. The surface was calculated with 

15 GRASP fNicholls. A., et al.. Proteins //;28 1-296 (1991 )) and colored according to the electrostatic potential. Dark blue 
corresponds to positive potential > + 15.0 kT/e and dark red to a negative <-l 5.0 kT/e potential. Figure 6A: standard 
view, along the diad with the dimer oriented as in Figure 4. Figure 6B: side view of the dimer, ninety degrees rotated 
with respect to 6A. 

Figure 7 A-F presents a topological comparison of 6 members of the hydrolase fold family. The arrangement 

20 of structural elements in the central core domain (in green and yellow) of the different proteins is generally similar. The 
cap domains (in red) vary greatly. The following structures are shown starting from the top left hand corner (references 
and PDB entry codes are given in between brackets): Figure 7 A shows the PPCA precursor cap domain that consists of 
two subdomains one a -helical and the other mainly 0-sheet. Figure 7B shows CPW (3SC2, Liao et al. (1992) infra), cap 
domain helical; Figure 7C shows CPY (LYSC, Endrizzi et al. (1994), infra), cap domain helical; Figure 7D shows 

25 dehalogenase (2 HAD, Franken etal., J. EMBO 70:1297-1302 (1991)), cap domain helical but quite different from the 
serine carboxypeptidases; Figure 7E shows lipase from Pseudomonas glumae (1TAH, Noble et al. FEBS Lett. JJ/:I23- 
128 (1993)), cap domain mixed a- helical and {^strands; and Figure 7F shows acetylcholine esterase (1ACE. Sussman 
et al.. Science 253: 872-879 (1991)), cap domain large and predominantly a-helical. The secondary structure 
assignments were generated with the computer program O. using structures provided and/or available from the 

30 Brookhaven Protein Data Bank. (This Figure was generated using MOLSCRIPT (Kraulis (1991), infra)). 

Figure 8A-B shows the superposition of the C* traces from the PPCA and CPW monomers, showing that the 
major differences between the two enzymes are localized in the cap domain. PPCA has a large 'maturation* subdomain 
and the 'helical subdomain' is rotated with respect io the CPW counterpart (Figure drawn with the O program (Jones 
(1991), infra)). Figure 8B shows the C traces from the PPCA and CPW dimers after the core domains from the subunits 

35 (shown on the right hand side of the two dimers) have been superimposed. Notice the remarkable difference in mutual 
orientation (of 15°) of the two subunits on the left hand side of the two dimers, which has been accentuated by an arrow. 
(Figure drawn with the O computer program (Jones ( 1 99 1 ), supra)). 

Figure 9 is a stereo view of the Ca trace of PPCA monomer 1 highlighting regions involved in the maturation 
event. Color scheme for the trace is as follows: core domain in light blue, helical subdomain in red. maturation 

40 subdomain in orange with the exception of the excision peptide (residues 285-298) which is shown in blue. Orange 
sphere mark the residues 272 and 277 marking the beginning and end of the blocking peptide. The catalytic triad Ser 
150. His 429 and Asp 372 is shown as light blue spheres. Two cysteines Cys 253 and Cys 303 referred to in the 
discussion are colored green. (This Figure generated using MOLSCRIPT (Kraulis ( 1991 ). infra)). 

Figure 10 is a close-up representation of the 'blocking' peptide (residues 272-277) bound in the active site. 

45 rendering the catalytic triad solvent inaccessible. Residues from the maturation subdomain are shown in orange, residues 
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from the helical domain in magenta and residues from the core domain in cv «„ tk. 

Side chains are shown for residues makinc , , ■ * CXC,S '° n is shown in b ' u * 

n ror rcsidues makm g extensive contacts with the block in p r^tM* rtr ; f _ 

-lytic triad is shown in wnite . (Fjgure ^ wjth Q (Joncs ( , * e i) b '^)) t,<>nCd " ^ 

fcbe»ed. Rearrangement of we rcsiducs 254 . 302 .^J J J ^ Cy 3 TT ^ " - 

she cleft. A charge Custer Arg 262. G.u 264. Arg 298 and Asp 300 7cuZ a ™ > "* **" 

s UM o.a.. PO ss ibIy , volved in PH dcpendem ^ gularion jzz^z::: c *■ r urat,on 

10 was ca.cu.ated and visualized with the atomic coordinates by BIOGRAF (B^OG^^T'u ' 

3. 2. ...June 1993). *JivAr (olUUKAF Construct Users Guide Version 

Figure 12 is a schematic representation of the proposed activation of PPCA. The active site cleft is f om «i k 
•he core don,., (indicated as 'core' in the above scheme) and the helical subdomain (indicated LZ) ^ Li 
su Woma m (mdicatedas'm')conta^ ■ k The maturation 

15 inactive .shown in suture .. * me acidic endosome,^^^ 

pathway 2a. conformational rearrangements induced bv low pH might render the Va, '° n - ,n ac " va, ' on 

proteases as a Hrst step, followed by cleavage of me po.ypepL^ 1^^^^ 
« pathway 2b . proteolytic cleavage of me excision pept.de might form the trigger for he Z ZZ A1,Cn,a " VC, >- 

20 ,he b " k ~ h - r active s,,c Md *- — **> ^ ~ t^zr^ 

F.g»re 13 shows the ammo acid sequence of a human pPPCA. The underlined portion (residues 285 2*K 
shows an exc.s.on peptide for conversion to the mature form, PPCA. } 

Figure 14 shows the amino acid sequence of a human PPCA. 

Figure IS shows a sequence alignment between pPPCA CPW and cpy ,k . 

residues among a„ three sequences are boxed. Residue numbers" , uded Tr^ZT" 1' 
25 -^.smadeus^ 

knowledge from me superposition of the CPW (Liao e, al. . 1 992) and CPY (Endrizzi e, a, . 994) aZf c c T 
Thea^en, w, later used to des.gn a mu,-A.a search probe for mo.ecu.ar r^I^Z^Z 

e dTvid?d Z S r " ' ,n0de, '• ^ 5VUaan deteminati ° n ° f PPPCA — - the protein ct 

be d.v.ded in two domams: a core" domain (residues 182 and 303-452) and 'cap' domain (residues 183 302TT 

30 secondary structure elements for the PPCA precursor are depicted with shaded bars (for deui on ,h 

nomenclature, see Rudenko « al. Structure 3: 1249-1259 (.988) ). aSS ' gnmen, "" d 

Figure 16 shows a schematic representation of a •bootstrapping cycle as described in Example 2 
Ftgure 1 7 is a representation of an initial molecular mask enlarged to accommodate missing area's in the mn HH 

Figure 18 is a representation of an enlargement of the model during the bootstrapping procedure nin„.H , 
well as „„„„„ or 5,^, IKie ( ^ ^ rfhiildto. In the m.l.,1 . 

40 ^ .fC»- cycles ., to »eo fcr .e „ ta ^ 

Figure 1 9 is a representation of a comparison of the C trace from a monomer core mnrf.i /.i. 
and ,e comp lete PPCA monomer (shown in ye „ ow , ^ _ model ~ ^ ; n ) 

PPCA monomer cons.sts of a core domain and a cap domain. The helical subdomain and the maturation subdTr- 
forming the cap domain have been shown in the figure above. maturation subdomam 
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Figure 20A-D is a representation of the resolving power of the bootstrapping procedure showing three different 
stages .n map quality. The atomic coordinates of the refined model are visualized with the electron density in Figures 
20B. 20C and 20D. Figures 20A and 20B show the initial 2m|F c ^|-D|F Mlt | SigmaA weighted map calculated using 
phases from the molecular replacement solution. The electron density is essentially uninterpretable. Fig. 20C shows 
5 twofold averaged 2 |F,J-|F.J electron density map calcu.a»ed using inverted phases from cycle bmc6 The density for 
P-strand M02 (residues 266-271) has become clearly visible. Fig. 20D shows unaveraged 2m|F oe .|-D|F I SigmaA 
weighted map calculated using phases from the refined mode.. The quality of the density is very good. Density for the 
hehx Ma, (rescues 287-293) which assumes a different conformation in the two monomers is now also apparent 

F,gure 2t shows a Ramachandran plot calculated for one monomer from a refined model of a pPPCA Both 
1U monomers in the asymmetric unit give essentially equivalent plots. 

Figure 22 shows a schematic of a computer system for PPCA or pPPCA structure determination and/or rational 

drug design. 

Figure 23.1-S2 lists the atomic coordinates for the active site of a pPPCA dimer having the amino acid 
sequence presented as portions of at least one of 50-76, 144-155. 173-197.226-253.226-288.294-310 327-344 338 
350. 366-38, and 423-436 of (F.gure 23,-23.26) 452 amino acids (des.gnated M52) of monomeM aTwe" s 
corresponding portions of (Figure 23.26-23.52) 452 amino acids (designated .001-1452) of monomer 2. 

Detailed Description of the Preferred Embodiments 
The present invention provides methods for expressmg. purifying and crvsra.liz.ng a protective 
pro em/cathepsm A (PPCA) or a precursor protective protein/catheps.n A (pPPCA). where the crystal! diffl x-ray 
with sufficiently high resolution to allow determination of the three-dimensional structure of the PPCA or pPPCA 0 
a port.on or subdomain thereof. The three-dimensiona. structure (e. g ..as provided on computer readab.e media of the 
present mvent.on) is useful for rational drug design of ligands of a PPCA or a pPPCA. Such ligands can be synthesized 

aneT " d USeft " " di8SnOS,iC a6entS ° r ^ ™^ - pre " 

at least one PPCA- or pPPCA-related pathology. 8 

rav dfT ^ d ? m,i "l d SOUCWrC " madC USi " 8 PPCA ° F PPPCA amin ° 3Cid ««^ atomic coordinate/x- 

s~:°; :; a D wh, ; h r ,o prov,de a,omic ™™ - <° , he ^^jl 

oifZ 7" T ° n C ° mPU,er rCadab,C media 11,6 C ° mpUter ana, y jis of lhe coordinate/x-ray 

*fTr con data and/or the ammo acid sequence allows the calculation of the secondary, tertiary and/or quaternary 

sutures; domains; and/or subdomains of the protein. These domains are combined and reled by adding 

X^^T e c ,7 uler " to detem,ne ,he mos * probab,c or — 

PPCA or^l TTT r h °* - a,S ° PTOVided by «— «v«n*m for rational drug design (ROD) of 
PPCA or pPPCA hgands. Such drug design uses computer modeling programs that ca.cu.ate different mo. ecu.es 

Z Z 0 TZT ,he de,em,ined ac,ivc sites - binding si,es - 0T other sM " 

subdomams of a PPCA or a pPPCA. These .igands can then be produced and screened for activity in modu.at Z Z 
b.nd,ng to a PPCA or pPPCA. according ,o methods and compositions of the present invention ' 

The actual PPCA or pPPCA-.igand complexes can optionally be crystallized and ana.yzed usin* x rav 
djm-act.on -hmques. The diffract,on panems obta.ned are s.milar.v used to calculate the mree-dimenl a ^ 

doma,n(s or subdoma,n( S ) of the PPCA or pPPCA. Such screening methods are selected from assays for t lea o ne 
b,U> g , a . acnvity of a PPCA or a pPPCA. The resulting ligands. provided by methods of the prese , „ Ion 

,nd at least one ppca or pppca and are usefui for - 
zzT^c::: anm ? such 85 humans - L,sands of a panicu,ar ppca - pppca - - — 

fft-As or pPPCAs from other sources, such as other eukaryotes 
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A PPCA or pPPCA is also provided as a crystallized protein suitable for x-ray diffraction analysis. The x-ray 
diffraction patterns obtained by the x-ray analysis are of moderate, to moderately high, to high resolution, e.g.. 30-10. 
10-3.5 or 1 .5-3.5 A. respectively, with the higher resolutions included. These diffraction patterns are suitable and useful 
for three-dimensional structure determination of a PPCA or a pPPCA. domain or subdomain thereof. 
5 The determination of the three-dimensional structure of a PPCA or pPPCA has a broad-based utility. 

Significant sequence identity and conservation of important structural elements are expected to exist among different 
PPCAs or pPPCAs. Therefore, the three-dimensional structure from one or few PPCAs or pPPCAs can be used to 
identify ligands that have diagnostic or therapeutic value for at least one PPCA- or pPPCA-related pathology that may 
involve PPCAs or pPPCAs having different amino acid sequences. 

1 0 Determination of Protein Structures 

Different techniques give different and complementary information about protein structure. The primary 
structure is obtained by biochemical methods, either by direct determination of the amino acid sequence from the 
protein, or from the nucleotide sequence of the corresponding gene or cDNA. The quaternary structure of large proteins 
or aggregates can also be determined by electron microscopy. To obtain the secondary and tertiary structure, which 

15 requires detailed information about the arrangement of atoms within a protein, x-ray crystallography is preferred. See. 
e.g., Blundell, infra: Oxender, infra; Mcpherson, infra; Wyckoff, infra. 

The first prerequisite for solving the three-dimensional structure of a protein by x-ray crystallography is a well- 
ordered crystal that will diffract x-rays strongly. The crystatlographic method directs a beam of x-rays onto a regular, 
repeating array of many identical molecules so that the x-rays are diffracted from it in a pattern from which the structure 

20 of an individual molecule can be retrieved. Well-ordered crystals of globular protein molecules are large, spherical, or 
ellipsoidal objects with irregular surfaces, and crystals thereof contain large holes or channels that are formed between 
the individual molecules. These channels, which usually occupy more than half the volume of the crystal, are filled with 
disordered solvent molecules. The protein molecules are in contact with each other at only a few small regions. This 
is one reason why structures of proteins determined by x-ray crystallography are generally the same as those for the 

25 proteins in solution. 

The formation of crystals is dependent on a number of different parameters, including pH. temperature, protein 
concentration, the nature of the solvent and precipitant, as well as the presence of added ions or ligands t:o the protein. 
Many routine crystallization experiments may be needed to screen all these parameters for the few combinations that 
might give crystal suitable for x-ray diffraction analysis. Crystallization robots can automate and speed up the work of 
30 reproducibly setting up large numbers of crystallization experiments. 

A pure and homogeneous protein sample is important for successful crystallization. Proteins obtained from 
cloned genes in efficient expression vectors can be purified quickly to homogeneity in large quantities in a few 
purification steps. A protein to be crystallized is preferably at least 93-99% pure according to standard criteria of 
homogeneity. Crystals form when molecules are precipitated very slowly from supersaturated solutions. The most 
35 frequently used procedure for making protein crystals is the hanging-drop method, in which a drop of protein solution 
is brought very gradually to supersaturation by loss of water from the droplet to the larger reservoir that contains sail 
or polyethylene glycol solution. 

Different crystal forms can be more or less well-ordered and hence give diffraction patterns of different quality. 
As a general rule, the more closely the protein molecules pack, and consequently the less water the crystals contain, the 
40 better is the diffraction pattern because the molecules are better ordered in the crystal. 

X-rays are electromagnetic radiation at short wavelengths, emitted when electrons jump from a higher to a 
lower energy state. In conventional sources in the laboratory, x-rays are produced by high-voltage tubes in which a 
metal plate, the anode, is bombarded with accelerating electrons and thereby caused to emit x-rays of a specific 
wavelength, so-called monochromatic x-rays. The high voltage rapidly heats up the metal plate, which therefore has 
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to be cooled. EfTicienr cooling is achieved by so-called rotating anode x-ray generators, where the metal plate revolves 
during the experiment so that different pans are heated up. 

More powerful x-ray beams can be produced in synchrotron storage rings where electrons (or positrons) travel 
close to the speed oflight. These particles emit very strong radiation at all wavelengths from short gamma rays to visible 
light. When used as an x-ray source, only radiation within a window of suitable wavelengths is channeled from the 
storage ring. Polychromatic x-ray beams are produced by having a broad window that allows through x-ray radiation 
with wavelengths of 0.2 - 3.5 A. 

In diffraction experiments a narrow and parallel beam of x-rays is taken out from the x-ray source and directed 
onto the crystal to produce diffracted beams. The incident primary beam causes damage to both protein and solvent 
molecules. The crystal is, therefore, usually cooled to prolong its lifetime (e.g., -220 to -50°C). The primary beam must 
strike the crystal from many different directions to produce all possible diffraction spots, and so the crystal is rotated 
in the beam during the experiment. 

The diffracted spots are recorded either on a film, the classical method, or by an electronic detector. The 
exposed film has to be measured and digitized by a scanning device, whereas electronic detectors feed the signals they 
15 detect directly in a digitized form into a computer. Electronic area detectors (an electronic film) significantly reduce 
the time required to collect and measure diffraction data. 

When the primary beam from an x-ray source strikes the crystal, some of the x-rays interact with the electrons 
on each atom and cause them to oscillate. The oscillating electrons serve as a new source of x-rays, which are emitted 
in almost all directions, referred to as scattering. When atoms (and hence their electrons) are arranged in a regular three- 
20 dimensional array, as in a crystal, the x-rays emitted from the oscillating electrons interfere with one another. In most 
cases, these x-rays, colliding from different directions, cancel each other out; those from certain directions, however, 
will add together to produce diffracted beams of radiation that can be recorded as a pattern on a photographic plate or 
detector. 

The diffraction pattern obtained in an x-ray experiment is related to the crystal that caused the diffraction. X- 
25 rays that are reflected from adjacent planes travel different distances, and diffraction only occurs when the difference 
in distance is equal to the wavelength of the x-ray beam. This distance is dependent on the reflection angle, which is 
equal to the angle between the primary beam and the planes. 

The relationship between the reflection angle (8), the distance between the planes (d), and the wavelength (X) 
is given by Bragg's law: 2d sin 0 = X. This relation can be used to determine the size of the unit cell in the crystal. 
30 Briefly, the position on the film of the diffraction data relates each spot to a specific set of planes through the crystal. 
By using Bragg's law, these positions can be used to determine the size of the unit cell. 

Each atom in a crystal scatters x-rays in all directions, and only those that positively interfere with one another, 
according to Bragg's law, give rise to diffracted beams that can be recorded as a distinct diffraction spot above 
background. Each diffraction spot is the result of interference of all x-rays with the same diffraction angle emerging 
35 from all atoms. For example, for the protein crystal of myoglobin, each of the about 20.000 diffracted beams that have 
been measured contain scattered x-rays from each of the around 1500 atoms in the molecule. To extract information 
about individual atoms from such a system requires considerable computation. The mathematical tool that is used to 
handle such problems is called the Fourier transform. 

Each diffracted beam, which is recorded as a spot on the film, is defined by three properties: the amplitude, 
which we can measure from the intensity of the spot; the wavelength, which is set by the x-ray source: and the phase, 
which is lost in x-ray experiments. All three properties are needed for all of the diffracted beams, in order to determine 
the position of the atoms giving rise to the diffracted beams. 

For larger molecules, protein crystallographers have determined the phases in many cases using a method called 
multiple isomorphous replacement (MIR) (including heavy metal scattering), which requires the introduction of new 
x-ray scatterers into the unit cell of the crystal. These additions are usually heavy atoms (so that they make a significant 
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comribution to the diffraction pattern), such that there should not be too many of them (so that their positions can be 
located); and they should not change the structure of the molecule or of the crystal cell, i.e. the crystals should be 
isomorphous. Isomorphous replacement is usually done by diffusing different heavy-metal complexes into the channels 
of the preformed protein crystals. The protein molecules expose side chains (such as SH groups) into these solvent 
channels that are able to bind heavy metals. It is also possible to replace endogenous light metals in metalloproteins with 
heavier ones, e.g., zinc by mercury, or calcium by samarium. 

Since such heavy metals contain many more electrons than the light atoms (H. N. C. O. and S) of the protein, 
they scatter x-rays more strongly. All diffracted beams would therefore increase in intensity after heavy-metal 
substitution if ail interference were positive. In fact, however, some interference is negative; consequently, following 
heavy-metal substitution, some spots measurably increase in intensity, others decrease, and many show no detectable 
difference. 

Phase differences between diffracted spots can be determined from intensity changes following heavy-metal 
substitution. First, the intensity differences are used to deduce the positions of the heavy atoms in the crystal unit cell. 
Fourier summations of these intensity differences give maps of the vectors between the heavy atoms, the so-called 
15 Patterson maps. From these vector maps the atomic arrangement of the heavy atoms is deduced. From the positions 
of the heavy metals in the unit cell, one can calculate the amplitudes and phases of their contribution to the diffracted 
beams of protein crystals containing heavy metals. 

This knowledge is then used to find the phase of the contribution from the protein in the absence of the heavy- 
metal atoms. As both the phase and amplitude of the heavy metals and the amplitude of the protein alone is known, as 
well as the amplitude of the protein plus heavy metals protein heavy-metal complex), one phase and three 
amplitudes are known. From this, the interference of the x-rays scattered by the heavy metals and protein can be 
calculated to see if it is constructive or destructive. The extent of positive or negative interference, with knowledge of 
the phase of the heavy metal, give an estimate of the phase of the protein. Because two different phase angles are 
determined and are equally good solutions, a second heavy-metal complex can be used which also gives two possible 
25 phase angles. Only one of these will have the same value as one of the two previous phase angles; it therefore represents 
the correct phase angle. In practice, more than two different heavy-metal complexes are usually made in order to give 
a reasonably good phase determination for all reflections. Each individual phase estimate contains experimental errors 
arising from errors in the measured amplitudes. Furthermore, for many reflections, the intensity differences are too small 
to measure after one particular isomorphous replacement, and others can be tried. 
30 The amplitudes and the phases of the diffraction data from the protein crystals are used to calculate an electron- 

density map of the repeating unit of the crystal. This map then has to be interpreted as a polypeptide chain with a 
particular amino acid sequence. The interpretation of the electron-density map is made more complex by several 
limitations of the data. First of all, the map itself contains errors, mainly due to errors in the phase angles. In addition, 
the quality of the map depends on the resolution of the diffraction data, which in turn depends on how well-ordered the 
35 crystals are. This directly influences the image that can be produced. The resolution is measured in k units; the smaller 
this number is. the higher the resolution and therefore the greater the amount of detail that can be seen. 

Building the initial model is a trial-and-error process. First, one has to decide how the polypeptide chain 
weaves its wnv through the electron-density map. The resulting chain trace constitutes a hypothesis, by which one tries 
to match tht * .nsity of the side chains to the known sequence of the polypeptide. When a reasonable chain trace has 
40 finally been obtained, an initial model is built to give the best fit of the atoms to the electron density. Computer graphics 
are used both for chain tracing and for model building to present the data and manipulated the models. 

The initial model will contain some errors. Provided the protein crystals diffract to high enough resolution (e.g., 
better than 3.5 A), most or substantially all of the errors can be removed by crystallographic refinement of the model 
using computer algorithms. In this process, the model is chanced to minimize the difference between the experimentally 
45 observed diffraction amplitudes and those calculated for a hypothetical crystal containing the model (instead of the real 
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molecule). This difference is expressed as an R factor (residual disagreement) which is 0.0 for exact agreement and 
about 0.59 for total disagreement. 

In general, the R factor is preferably between 0.15 and 0.35 (such as less than about 0.24-0.28) for a well- 
determined protein structure. The residual difference is a consequence of errors and imperfections in the data. These 
derive from various sources, including slight variations in the conformation of the protein molecules, as well as 
inaccurate corrections both for the presence of solvent and for differences in the orientation of the mtcrocrystals from 
which the crystal is built This means that the final model represents an average of molecules that are slightly different 
both in conformation and orientation. 

In refined structures at high resolution, there are usually no major errors in the orientation of individual 
residues, and the estimated errors in atomic positions are usually around 0.1-0.2 A, provided the amino acid sequence 
is known. Hydrogen bonds, both within the protein and to bound ligands, can be identified with a high degree of 
confidence. 

Most x-ray structures are determined to a resolution between 1 .7 A and 3.5 A. Electron-density maps with this 
resolution range are preferably interpreted by fining the known amino acid sequences into regions of electron density 
15 in which individual atoms are not resolved. 

An amino acid sequence is preferred for accurate x-ray structure determination. Thus, recombinant DNA 
techniques have had a double impact on x-ray structural work. When a protein is cloned and overexpresscd for structural 
studies, the amino acid sequence, necessary for the x-ray work, is also quickly obtained via the nucleotide sequence. 
Recombinant DNA techniques give us not only abundant supplies of rare proteins, but also their amino acid sequence 
20 as a bonus. See. e.g.. Blundell, infra: Oxender, infra: McPherson, infra: Wyckoff, infra. 
Isolated PPCA and p PPCA Polypeptides 

A PPCA or pPPCA polypeptide can refer to any subset of a PPCA or pPPCA as a domain, subdomain, 
fragment, consensus sequence or repeating unit thereof. A PPCA or pPPCA polypeptide of the present invention can 
be prepared by, e.g..: 

25 (a) recombinant DNA methods; 

(b) proteolytic digestion of the intact molecule or a domain, subdomain or fragment thereof; 

(c) chemical peptide synthesis methods well-known in the art; and/or 

(d) by any other method capable of producing a PPCA or pPPCA polypeptide and having a conformation 
similar to a structural or functional subdomain of a PPCA or a pPPCA. 

A biological activity of PPCA or pPPCA can be screened according to known screening assays. The minimum 
peptide sequence to have activity is based on the smallest unit containing or comprising a particular domain, subdomain, 
fragment, region, consensus sequence, or repeating unit thereof, having at least one biological activity of a PPCA or 
pPPCA, such as protecting activity, inhibiting activity or enzyme activity. Non-limiting examples of such activities are: 
protecting activity for P-galactosidase or neuraminidase fNA); modulating activity (inhibition, stimulation or activation) 
as an for endothelin I (serine carboxypeptidase) or cathepsin A and peptide hydrolyzing activity (e.g.. substance P and 
substance P-free acid; oxytocin and oxytocin-free acid; neurokinin A: angiotensin i; and bradykinin. 

According to the present invention, a PPCA or pPPCA includes an association of two or more polypeptide 
subdomains, such as at least one 4 amino acid portion of a core or cap domain of a PPCA or pPPCA. This can include 
1-14 subdomains of the cap domain and/or 1-44 subdomains of the core domain (as monomers or dimers). or any range, 
value or combination thereof. Preferably 1-4 sets of each of at least one core or cap domains or subdomains are 
included. 

The structure of a monomer or domain of at least one PPCA includes at least one subdomain of a PPCA of a 
pPPCA of the present invention can include one or more of the following subdomains. as described herein. Generally 
a PPCA or pPPCA consists of a dimer of a core domain and a cap domain having the following subdomains having the 
45 specified residues, e.g.. as presented in Figure 13 (pPPCA) or Figure 14 (PPCA):: 
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84 CQft ill??™ SUM ° mainS: CP '- 2 '" 27: °P 2 - 3M * C P 3 - 50-54; C«l. 63-67; CP 4. 73-75; C P 5. 82- 
Ca6 336.34 r ? " 8 - ,35:CP7 - 144 -'^Ca3. 152-.63;C P 8. ,7,-, 77 ; Ca4. 307-3,3; Ca5. 3.6-32.; 

C P ^ 1 35 °" 359: CP9 ' 363 ' 369: C ° 8 - 3?7 ' 386; CP '°- 39 *- 40,; CP1 «°«>* CP.2. 4.9-424 
Ca9. 43 1-434; Ca .0.436-447; and 

™ , CS ' P 0 d< " nain suMomains: H«l. 183-196; H«2. 202-212; Ha 3. 226-240; MBI. 26.-264; M P 2. 267- 
70; Ma.. 290-293; M P 3. 296-299. Note that for monomer 2 «h e secondary structure assignments in the cap 
domain are slightly different than in monomer I. 

A PPC A or pPPCA polypeptide of the invention can have a, least 80% homology, such as 80- 1 00% overall 
homology or .dentity. with one or more corresponding PPCA or pPPCA subdomains or fragments as described herein 
such as a 4-542 ammo acid fragment or portion of the ammo acid sequence of Figures 13. .4 or .5 As would be' 
understood by one of ordinary skill in the an. the above configurations of subdomains are provided as part of a PPCA 
or pPPCA polypept.de of the invention, when expressed in a suitable host cel.. or otherw.se svnthesized ,o provide at 
leas, one structural or functiona. feature of a native PPCA or pPPCA. such as a. .eas. one PPCA-rela.ed b.o.o g ica. 
ac,,v„y. Such activities can be assayed using a suitable assay, to estab.ish a, leas, one PPCA biolog.ca. .c.ivi.y of one 
or more PPCAs or pPPCAs of the invention. A PPCA orpPPCA polypeptide of the invention ,s not natura.ly occurrin. 
or ,s naturally occurring but is in a purified or isolated form which does no, occur in nature. Examples of Stable PPCA 
acvry assay mc.ude. e.g.. cathepsin A activity (Ga.jart « ai. J. Biol CHe m . 266A 4754-. 4762 (1991): Endothelin I 
deam.dase act.vity (Jackmar, e, ai. J. Biol. Chen,. 2rf7:2872.2875( 1 992); and tachykinin deamidase activity (Jackman 
el at. . J Biol. Chem. 265: 1 1265-1 1272 (1 990)). 

Percent homology or identity can be determined, for example, by comparing sequence information using the 
GAP computer program, version 6.0. available from the University of Wisconsin Genetics Computer Grotp (UWGCG) 
The GAP program ut.l.zes the alignment method of Needleman and Wunsch U MoL Biol. 48 443 (1970) as revised 
by Srm.h and Waterman (Ad,. AppL Math. 2:482 (1981). Briefly, the GAP program defines similarity as the number 
of ahgned symbols (i.e.. nucleotides or amino acids) wh.ch are similar, divided by the total number of symbols in the 
shorter of the two sequence, The preferred defau.t parameters for the GAP program include: ( , ) a unitary comparison 
tnamx (conta.n.ng a value of . for identities and 0 for non-identities) and the weighted comparison matrix of Gribskov 
and Burgess, Nucl. Acids Res. 14:6745 (1986). as described by Schwartz and Dayhoff eds ATLAS OF PROTEIN 
SEQUENCE AND STRUCTURE, National Biomedical Research Foundation, pp. 353-358 (1979); (2) a pena.ty of 3 0 
for each gap and an additional 0. 1 0 penalty for each symbol in each gap: and (3) no penalty for end gaps 

Thus, one of ordinary skill in the art, given the teachings and guidance presented in the present specification 
w,.| know how to add. delete or substitute other amino acid residues in other positions of a PPCA or pPPCA ,o obtain 
substituted, deletional or additional variants thereof. 

Non-limiting examples of substitutions of a PPCA or pPPCA domains or polypeptide of the mvemion are those 
m which at least one amino acid residue in the protein molecule has been removed arid a different residue added in its 
place according to the following Table 2. The types of substitutions which can be made in the prote.n or peptide 
molecule of the invention can be based on analysis of the frequencies of amino acid changes between a homologous 
protein of different species, such those presented in Figure 1 5. Based on such an analysis, alternative substitutions are 
defined herein as exchanges within one of the following five groups: 

I Small aliphatic, nonpolar or slightly polar residues: Ala. Scr. Thr (Pro. Gly): 

2- Polar, negatively charged residues and ihcir amides: Asp. Asn. Glu. Gin. 

3 Polar, positively charged residues 
His. Arg. Lys; 

4 Large aliphatic, nonpolar residues: 
Meu Leu. lie. Val (Cys); and 

5. Large aromatic residues: Phc. Tyr. Trp 

Most deletions and additions, and substitutions according to the invention are those which do not produce 
rad.cal changes ,n the characteristic- r, ne protein or peptide mo)ecu , e .. Characteristics .. js def|ned jn & non . jnc|usjve 
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manner to define both changes in secondary structure, e.g. a-he.ix or M>ee«. well M changes in physjologicai 
actmty. , g .. ,n b.o.ogica. activity assays. However, when the exact effect of , he substitution, deietion. or addition is 
«> be conned, one skilled in the an wil, appreciate that the effect of a, least one substitution, addition or deletion wil, 
be evaluated by at .east one PPCA or pPPCA screening assay, such as. but not limited to. immunoassays or bioassay 
5 to confirm at least one PPCA or pPPCA biological activity. oioassays. 

co™ ft S ; rPriSin8,y " 3 PPCA and/ ° r 3 PPPCA is ~* ered to have serine carboxylase activity and 

orresponding structural feature, a.though having only abou, 30% sequence identity to wheat and veas, 12, 
carboxypcptidases. These carboxylases are members of the hvdro.ase fold fami.y (Liao e, a, B * 

10 The senne carboxypeptidases have peptidase activity at acidic P H ( pH 4.5-5.5) as well as deamidase and LZ e 
acnv.t.esa, P H previewed in Breddam e,al. Carlsberg Res Co mmun . „:83-, 2 8 ( , 986); Rawljngs & 
,^„ol ogy . 2 „, 9 . 6i (1994)) . Mulagenes , s snidies an<j reyeajed • 

orefe TTT " IT" is - that of .ysosoma. cathepsin A and has" 

ZTol b op ; subsmtes such 35 ,he dipeptidc phe - A,a (Gaijan - j »« «~ ™™™ 

.90, )). On the bas,s of sequence alignments with members of the serine carboxylase fami.y. mutagenesis studies 

^rsrsr^ the catalyi,c ,r,ad in ppca has - been <— - ~ » - . 

PPCA and p PPCA Expression for Isolation and Purification 

A nucleic acid sequence encoding a PPCA or a pPPCA (Galjart e, at.. Celt 5^-755-764 fl988» «„ h 
recombined with vector DNA in accordance with convent.ona, techniques, including 

ermrn, for hgation. resection enzyme digestion to provide appropriate termini, fining in of cohel ends as 

t r„r ,e, f r phosphatase treamcm ,o avoid m ^ »*+ - 

Techniques for such man.pulations are disclosed, eg., in Sambroolc etai. Molecular Clonic Afl , ' ^ 
Second .ditio, Cold Spring Har^r Uboratory. Co.d Spring H^, NY , 989) : 
« M*»far B,o/ ogy , Wiley Interscience. N.Y.. (.988-1995) and are well known in the art 

A nucleic acid molecule, such as DNA, is said to be "capable of exore^ino- * . „ ■ 

operably l.nked" to nucleotide sequences which encode the polypeptide An o^ku i; u ,- Sequcnces are 

^ DN A „ d , ht DNA ^ ^ Zl « ^t^VJTT™ *• 
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The invention accordingly encompasses the expression of a PPCA or a dPPCA in *i,h„ v 
cukaryotic cel., although eukaryotic expression ,s preferred Preferred hosts are b c eril' ^^'^ » 

bacteria, yeas,, insect, fungi, bird and mammahan Lis either * ^or^T^T T^T" inC ' Uding 

sheep, horse, goat, dog or cat origin, but any other mammalian cell can be used S> 
Pref emd EU l; a,y0tiC r StS ^ inC ' Ude ^ inSeCtS ' fUnSL mamma ' ian «" s e » h « >" -o. or in tissue culture 

cu ;^:r os,s 7 a, T c,ude - but " not iimi,ed ,o insew ce,,s - — - 

CHO K 1 n T M Xe " OPUS ^ HCU " ,1S - Ce " S 0f fibrob '^ origin such as VERO o 

CHO-K I . or cells of lymphoid origin and their derivatives. 

Mammalian cells provide post-translational modifications to protein molecule i„r...H 
S .vcosv,a,ion a, correct sues. Mammalian cells which can be usefu. as [ ^ correct folding or 

but not limited to N1H3T3 VFRO or rur, ^ ., r. fibroblast ongin such a, 

SP-VO a. id , ymph ° id ° fiSin - SUCh ^ bu « "« «o. the hybridoma 

SP2/0-Agl4 or the mur.ne myeloma P3-X63Ag8. hamster cell lines (e C CHO K. a „H "y°"doma 

lines (e.g.. i_HU-KI and progenitors, e.g.. CHO- 
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DUXBi 1) and their derivatives. One preferred type of mammalian cells are cells which are intended to replace the 
function of the genetically deficient cells in vivo. Neuronally derived cells are preferred for gene therapy of disorders 
of the nervous system. For a mammalian cell host, many possible vector systems are available for the expression of 
at least one PPCA or pPPCA. A wide variety of transcriptional and translational regulatory sequences can be employed, 
5 depending upon the nature of the host. The transcriptional and translational regulatory signals can be derived from viral 
sources, such as, but not limited to, adenovirus, bovine papilloma virus. Simian virus, or the like, where the regulatory 
signals are associated with a particular gene which has a high level of expression. Alternatively, promoters from 
mammalian expression products, such as, but not limited to. actin. collagen, myosin, protein production. 

When live insects are to be used, silk moth caterpillars and baculoviral vectors are presently preferred hosts 

10 for large scale PPCA or pPPCA production according to the invention. Production of PPCA or pPPCA in insects can 
be achieved, for example, by infecting the insect host with a baculovirus engineered to express transmembrane 
polypeptide by methods known to those skilled in the related arts. See Ausubel infra, §§ 16.8* 16.1 1 . 

In a preferred embodiment, the introduced nucleotide sequence will be incorporated into a plasmid or viral 
vector capable of autonomous replication in the recipient host. Any of a wide variety of vectors can be employed for 

15 this purpose. See, e.g., Ausubel etai, infra, §§ 1.5, 1.10,7.1,7.3,8.1,9.6,9.7, 13.4, 16.2, 16.6, and 16.8-16.11. Factors 
of importance in selecting a particular plasmid or viral vector include: the ease with which recipient cells that contain 
the vector can be recognized and selected from those recipient cells which do not contain the vector; ::he number of 
copies of the vector which are desired in a particular host; and whether it is desirable to be able to "shuttle" the vector 
between host cells of different species. 

20 Different host cells have characteristic and specific mechanisms for the translational and post-translational 

processing and modification {e.g., glycosylation, cleavage) of proteins. Appropriate cell lines or host systems can be 
chosen to ensure the desired modification and processing of the foreign protein expressed. For example, expression in 
a bacterial system can be used to produce an unglycosylated core protein product. Expression in yeast will produce a 
glycosylated product. Expression in mammalian cells can be used to ensure "native" glycosylation of the heterologous 

25 PPCA or pPPCA. Furthermore, different vector/host expression systems can effect processing reactions such as 
proteolytic cleavages to different extents. 

As discussed above, expression of PPCA orpPPCA in eukaryotic hosts requires the use of eukaryotic regulatory 
regions. Such regions will, in general, include a promoter region sufficient to direct the initiation of RNA synthesis. 
See. e.g., Ausubel, infra; Sam brook, infra. 

30 Once the vector or nucleic acid molecule containing the consrruct(s) has been prepared for expression, the DN A 

construct s) can be introduced into an appropriate host cell by any of a variety of suitable means, i.e., transformation, 
transfection. conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate-precipitation, 
direct microinjection, and the like. After the introduction of the vector, recipient cells are grown in a selective medium, 
which selects for the growth of vector-containing cells. Expression of the cloned gene molecule(s) results in the 

35 production of a PPCA or pPPCA. This can take place in the transformed cells as such, or following the induction of 
these cells to differentiate (for example, by administration of bromodeoxyuracil to neuroblastoma cells or the like). 

A PPCA or pPPCA. or fragments thereof, of this invention can be obtained by expression from recombinant 
DNA according to known methods. Alternatively, a PPCA or pPPCA can be purified from biological material. A PPCA 
or a pPPCA can be purified from different mammalian tissues (e.g.. human placenta, rat liver, mouse liver, pig kidney, 

40 bovine testes, bovine liver, and the like) of various genus and species. 

The PPCA or pPPCA can be isolated and purified in accordance with conventional method steps, such as 
extraction, precipitation, chromatography, affinity chromatography, electrophoresis, or the like. For example, cells 
expressing at least one PPCA or pPPCA in suitable levels can be collected by centrifugation, or with suitable buffers, 
lysed. and the protein isolated by column chromatography, for example, on DEAE-celluiose. phcsphocellulose. 

45 polyribocytidylic acid-agarose, hydroxyapatite or by electrophoresis or immunoprecipitation. Alternatively, a pPPCA 
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or PPCA can be isolated by the use of antibodies, such as. but not limited to. a PPCA- or pPPCA-specific antibody. Such 
antibodies can be obtained by known method steps {see, e.g.. Harlow and Lane ANTIBODIES: A LABORATORY 
MANUAL Cold Spring Harbor Laboratory (1988); Colligan ei at., eds.. Current Protocols in Immunology. Greene 
Publishing Assoc. and Wiley Interscience, N.Y.. (1992. 1993). the contents of which references are entirely incorporated 
5 herein by reference). 

A PPCA or a pPPCA can be purified from different mammalian tissues (e.g.. human placenta, rat liver, mouse 
liver, pig kidney, bovine testes, bovine liver, and the like) of various genus and species, using known techniques such 
as gel filtration, phase separation and affinity chromatography, e.g.. using polyclonal or monoclonal antibodies specific 
for a PPCA or pPPCA. according to known methods. See., e.g.. Oxender et ai. Protein Engineering, Liss, New York 
10 (1986). 

Overview of PPCA or pPPCA Purification and Crystallization Methods 

In general, a PPCA or pPPCA is isolated in soluble form in sufficient purity and concentration (e.g.. a monomer 
or dimer) for crystallization. The PPCA or pPPCA is then isolated and assayed for biological activity (e.g., cathepsin 
A) and for lack of aggregation (which interferes with crystallization). The purified PPCA or pPPCA preferably runs 

15 as a single band for each monomer under reducing or nonreducing polyacrylamide gel electrophoresis (PAGE) 
(nonreducing is used to evaluate the presence of cysteine bridges). 

The purified PPCA or pPPCA is preferably crystallized under varying conditions of at least one of the 
following: pH. buffer type, buffer concentration, salt type, polymer type, polymer concentration, other precipitating 
ligands and concentration of purified PPCA or pPPCA. See, e.g., known methods (Blundell et ai., Protein 

20 Crystallography, Academic Press. London (1976); Oxender. infra; McPherson, The Preparation and Analysis of Protein 
Crystals. Wiley Interscience, N.Y. (1982)) or methods provided in a commercial kit, such as CRYSTAL SCREEN 
(Hampton Research, Riverside, CA). The crystallized PPCA protein can optionally be tested for at least one PPCA 
activity and differently sized and shaped crystals are further tested for suitability for x-ray diffraction. Generally, larger 
crystals provide better crystallographic data than smaller crystals, and thicker crystals provide better crystallographic 

25 data than thinner crystals. See, e.g.. Blundell, infra; Oxender, infra; McPherson, infra; Wyckoff et ai. Diffraction 
Methods for Biological MacromoleculesV bis. i 14-115, Methods in Enzymology, Academic Press. Orlando, FL (1985). 
Protein Crystallization Methods 

The hanging drop method is preferably used to crystallize the purified protein. See, e.g., Blundell, infra; 
Oxender, infra; McPherson, infra; Wyckoff, infra; Taylor et ai. J. Moi Biol. 226: 1287-1290 (1992); Takimoto et ai 

30 (1992). infra; CRYSTAL SCREEN, Hampton Research. 

A mixture of the purified protein and precipitant can include the following: 

• pH (e.g., 7-9); 

• buffer type (e.g., tromethamine (TRIZMA), sodium azide (NaN,), phosphate, sodium, or cacodylate 
acetates, imidazole, Tris HC1, sodium hepes); 

35 • buffer concentration (e.g., 1-100 mM); 

• salt type (e.g., sodium azide, calcium chloride, sodium citrate, magnesium chloride, ammonium 
acetate, ammonium sulfate, potassium phosphate, magnesium acetate, zinc acetate; calcium acetate) 

• polymer type and concentration: (e.g.. polyethylene glycol (PEG) 1-50%, type 400-10.000); 

• other additives (salts: potassium, sodium, tartrate, ammonium sulfate, sodium acetate, lithium sulfate. 
40 sodium formate, sodium citrate, magnesium formate, sodium phosphate, potassium phosphate: 

organics: 2-propanol; non-volatile. 2-methyl-2,4-pentanediol); p-octyl giucoside and 

• concentration of purified PPCA or pPPCA (e.g.. 1 .0-100 mg/ml). 
See. e.g.. CRYSTAL SCREEN. Hampton Research. 

A non-limiting example of such crystallization conditions is the following: 
45 • purified PPCA or pPPCA protein (e.£.. 5 mg/mt); 
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• (2) solutions in serial mixtures 

(1) 40-80 mM TRIZMA. 0.05-2.0 mM NaN Jt ; 

(2) 2-30% Polyethylene glycol (PEG) 8000 buffered with 40-80 mM TRIZMA and 
0.05-2.0 mM NaN, 

• 0.05-0.5% P-ocryl glucoside; 

• at an overall pH of about 8.0-8.3. 

The above mixtures are used and screened by varying at least one of pH. buffer type, buffer concentration 
precp.tat.ng salt type or additive or their concentrations. PEG type. PEG concentration, and protein concentration* 
Crystals ranging in size from 0.1-0.9 mm are formed in . -.4 days. These crystals diffract x-rays to at least 10 A 
«solut,on. such as 0.1 5- 1 0.0 A. or any range of value therein, such as 1.5. 1.6. 1.7. 1 8 1 9 20 2 1 "> 2 3 2 4 2 5 
2.6. 2* 2.8. 2.9. 3.0. 3. 1. 3.2. 3.3. 3.4 or 3.5. with 3.5 A or higher being preferred for the highest resolution. In addition 
to diffraction patterns having this highest resolution, lower resolution, such as 25-3.5 A can also be used See e e 
Blundell, infra; Oxender, infra: McPherson. infra; Wyckoff, infra; ' 
Protein Crystals 

Crystals appear after 1-14 days and continue to grow on subsequent days. Some of the crystals can be 
optionally removed, washed, and assayed for biological activity (e.g.. p PC A), which activity ,s preferred for using in 
further characterizations. Other washed crystals are preferably run on a gel and stained, and those that migrate in the 
same position as the purified PPCA or pPPCA are preferably used. From two to one hundred crystals arc observed in 
one drop and crystal forms can occur, such as. but not limited to, orthorombic. bipyramidal. rhomboid, and cubic Initial 
x-ray analyses indicate that such crystals diffract at moderately high to high resolution. When fewer crystals are 
produced in a drop, they can be much larger size. 0.4-0.9 mm. See. e.g., Blundell. infra. Ox:nder. infra 
McPherson. infra; Wyckoff, infra; ' 
PPCA and pP PC A X-ray Crystallography Methods 

The crystals so produced for a PPCA or pPPCA are x-ray analyzed using a suitable x-ray source Diffraction 
patterns are obtained. Crystals are preferably stable for at least 10 hrs in the x-ray beam . Frozen crysta s (e g -220 
to -50'C) are optionally used for longer x-ray exposures (e.g.. 5-72 hrs). the crystals being relatively more stable 'to the 
x-rays m the frozen state. To collect the maximum number of useful reflections, multiple frames are optionally collected 
as the crystal is rotated in the x-ray beam. e.g. . for 5-72 hrs. Larger crystals (>0.2 mm) are preferred, to increase the 
resolution of the x-ray diffraction patterns obtained. Crystals are preferably analyzed using a synchrotron high energy 
x-ray source. Us.ng frozen crystals, x-ray diffraction data is collected on crystals that diffract to at least a relatively high 
resolution of 10-1.5 A. with lower resolutions also useful, such as 25-10A. sufficient to solve the three-dimensional 
structure of a PPCA or pPPCA in considerable detail, as presented herein. 

Passing an x-ray beam through a crystal produces a diffraction pattern as a result of the x-ravs inte racting and 
be.ng scattered by the contents of the crystal. The diffraction pattern can be visualized using, e.g.. an inuge plate or 
film, result.ng m an tmage with spots corresponding to the diffracted x-rays. The positions of the spots in the diffraction 
pattern are used to determine parameters intrinsic to the crystal (such as unicell parameters) and to gain information on 
the packmg of the molecules in the crystal. The intensity of the spots contains the Fourier transformation of the 
molecules in the crystal, i.e.. information on each atom in the crystal and hence of the crystallized molecule. 

After data collection of diffraction patterns, the data is processed. This includes measuring the spots on each 
d.ffract.on partem in terms of position and intensity. This information is processed (i.e.. mathematical operations are 
performed on the data (such as scaling, merging and convening the data from intensity of diffracted beams to 
amphtudes)) to yield a set of data which is in a form as can be used for the further structure determinate of the 
molecule crystallized. The amplitudes of the diffracted x-rays are then combined with calculated phases to 3 roduce an 
electron density map of the contents of the crystal. In this electron density map. the structure of the molecules (as 
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present in the crystal) is built. The phases can be determined with various known techniques, one being molecular 
replacement. 

For the molecular replacement technique one takes a known three dimensional structure thought to share 
structural homology with the structure to be determined, to generate after calculations a first set of initial phases. These 
5 phases are then combined with the diffraction information of the molecule for which you want to solve the structure of. 
The result is an electron density map of the molecules in the crystal from which the diffraction patterns originate. 

The phases can be further optimized using a technique called density modification, which allows electron 
density maps of better quality to be produced facilitating interpretation and model building therein. The atomic model 
is then refined by allowing the atoms in the model to move in order to match the diffraction data as well as possible 
10 while continuing to satisfy stereochemical constraints (sensible bond lengths, bond angles and the like). See, e.g., 
Blundell, infra; Oxender, infra; Mcpherson, infra; Wyckoff, infra; 
Computer Related Embodiments 

An amino acid sequence of a PPCA or pPPCA and/or atomic coordinate/x-ray diffraction data, useful for 
computer structure determination of a PPCA, pPPCA or a portion thereof, can be "provided" in a variety of mediums 
15 to facilitate use thereof As used herein, provided refers to a manufacture, which contains a PPCA or pPPCA amino acid 
sequence and/or atomic coordinate/x-ray diffraction data of the present invention, e.g., the amino sequence provided 
in Figures 13-15, a representative fragment thereof, or an amino acid sequence having at least 80-100% overall identity 
to a 5-542 amino acid fragment of an amino acid sequence of Figures 13-15. Such a method provides the amino acid 
sequence and/or atomic coordinate/x-ray diffraction data in a form which allows a skilled artisan to analyze and 
20 determine the three- dimensional structure of a PPCA, a pPPCA or a subdomain thereof 

In one application of this embodiment, PPCA, pPPCA, or at least one subdomain thereof, amino acid sequence 
and/or atomic coordinate/x-ray diffraction data of the present invention is recorded on computer readable media. As 
used herein, "computer readable media" refers to any medium which can be read and accessed directly by a computer. 
Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and 
25 magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; 
and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any 
of the presently known computer readable media can be used to create a manufacture comprising computer readable 
medium having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present 
invention. 

30 As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled 

artisan can readily adopt any of the presently known methods for recording information on computer readable medium 
to generate manufactures comprising an amino acid sequence and/or atomic coordinate/x-ray diffraction data information 
of the present invention. 

A variety of data storage structures are available to a skilled artisan for creating a computer readable medium 
35 having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention. 
The choice of the data storage structure will generally be based on the means chosen to access the stored information. 
In addition, a variety of data processor programs and formats can be used to store the sequence and x-ray data 
information of the present invention on computer readable medium. The sequence information can be represented in 
a word processing text file, formatted in commercially-available software such as WordPerfect and MICROSOFT Word, 
40 or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. 
A skilled artisan can readily adapt any number of dataprocessor structuring formats (e.g. text file or database) in order 
to obtain computer readable medium having recorded thereon the information of the present invention. 

By providing on computer readable media having stored therein a PPCA or pPPCA sequence and/or atomic 
coordinates based on x-ray diffraction data, a skilled artisan can routinely access the sequence and atomic coordinate 
45 or x-ray diffraction data to model a PPCA. pPPCA. a subdomain thereof, or a ligand thereof. Computer algorithms are 
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publicly and commercially available which allow a skilled artisan to access this data provided on a computer readable 
medium and analyze it for structure determination and/or RDD. See. e.g.. Biotechnology Software Directory. Mary Ann 
Liebert Publ.. New York (1995). 

The present invention further provides systems, particularly computer-based systems, which contain the 
sequence and/or diffraction data described herein. Such systems are designed to do structure determination and RDD 
for a PPCA, pPPCA or at least one subdomain thereof. Non-limiting examples are microcomputer workstations 
available from Silicon Graphics Incorporated and Sun Microsystems running Unix based. Windows NT or IBM OS/2 
operating systems. 

As used herein, "a computer-based system" refers to the hardware means, software means, and data storage 
means used to analyze the sequence and/or atomic coordinate/x-ray diffraction data of the present invention. The 
minimum hardware means of the computer-based systems of the present invention comprises a central processing unit 
(CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate which of the 
currently available computer-based system are suitable for use in the present invention. A monitor is optionally provided 
to visualize structure data. 

As stated above, the computer-based systems of the present invention comprise a data storage means having 
stored therein a PPCA, pPPCA or fragment sequence and/or atomic coordinate/x-ray diffraction data of the present 
invention and the necessary hardware means and software means for supporting and implementing an analysis means. 
As used herein, "data storage means*' refers to memory which can store sequence or atomic coordinate/x-ray diffraction 
data of the present invention, or a memory access means which can access manufactures having recorded thereon the 
sequence or x-ray data of the present invention. 

As used herein, "search means" or "analysis means" refers to one or more programs which are implemented 
on the computer-based system to compare a target sequence or target structural motif with the sequence or x-ray data 
stored within the data storage means. Search means are used to identify fragments or regions of a PPCA or pPPCA 
which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a 
variety of commercially available software for conducting search means are and can be used in the computer-based 
systems of the present invention. A skilled artisan can readily recognize that any one of the available algorithms or 
implementing software packages for conducting computer analyses that can be adapted for use in the present computer- 
based systems. 

As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or 
combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration or electron 
density map which is formed upon the folding of the target motif. There are a variety of target motifs known in the art. 
Protein target motifs include, but are not limited to, enzymic active sites, structural subdomains, epitopes, functional 
domains and signal sequences. A variety of structural formats for the input and output means can be used to input and 
output the information in the computer-based systems of the present invention. 

A variety of comparing means can be used to compare a target sequence or target motif with the data storage 
means to identify structural motifs or interpret electron density maps derived in part from the atomic cocrdinate/x-ray 
diffraction data. A skilled artisan can readily recognize that any one of the publicly available computer modeling 
programs can be used as the search means for the computer-based systems of the present invention. 

One application of this embodiment is provided in Figure 22. Figure 22 provides a block diagram of a 
computer system 102 that can be used to implement the present invention. The computer system 102 includes a 
processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented 
as random access memory. RAM) and a variety of secondary storage memory 1 10, such as a hard drive 1 12. a removable 
storage medium 1 14. and a monitor 120. The removable medium storage device 1 14 may represent, for example, a 
floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 1 16 (such as a floppy 
disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into 



mM PCT/US96/17325 
WO 97/15583 

-17- 

thc removable medium storase medium ! 14. The computer system 102 includes appropriate software for reading the 
control logic and/or the data from the removable medium storage device 1 14 once inserted in the removable medium 
storage device 114. 

Amino acid, encoding nucleotide or other sequence and/or atomic coordinate/x-ray diffraction data of the 
5 present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 
1 10, and/or a removable storage device 1 16. Software for accessing and processing the amino acid sequence and/or 
atomic coordinate/x-ray diffraction data (such as search tools, comparing tools, etc.) reside in main memory 108 during 
execution. The monitor 120 is optionally used to visualize the structure data. 
Structure Determination 

10 One or more computational steps, computer programs and/or computer algorithms are used to build a molecular 

3-D mode! of a PPCA or pPPCA, using amino acid sequence data from Figures 13-15 (or variants thereof) and/or atomic 

coordinate/x-ray diffraction data, as presented herein. 

In x-ray crystallography, x-ray diffraction data and phases are combined to produce electron density maps in 

which the three-dimensional structure of a PPCA or pPPCA is then built or modeled. This structure can then be used 
15 for RDD of modulators of at least one PPCA- or pPPCA-related activity that is relevant to at least one PPCA- or 

pPPCA-related pathology. 

Density Modification and Map Interpretation. Electron density maps can be calculated using such programs 
as those from the CCP4 computing package (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory, 
UK, 1979). Cycles of two-fold averaging can further be used, such as with the program RAVE (Kleywegt & Jones, 

20 Bailey et al.. eds.+ First Map to Final Model, SERC Daresbury Laboratory, UK, pp 59-66 (1994)) and gradual model 
expansion. For map visualization and model building a program such as "O" (Jones (1991), infra) can be used. 

Refinement and Model Validation. Rigid body and positional refinement can be carried out using a program 
such as X-PLOR (Brunger (1992), infra), e.g.. with the stereochemical parameters of Engh and Huber (Acta Cryst. 
A 4 7: 3 92 -4 00 (1991)). If the model at this stage in the averaged maps still misses residues (e.g., at least 5*10 per 

25 subunit), the some or all of the missing residues can be incorporated in the model during additional cycles of positional 
refinement and model building. The refinement procedure can start using data from lower resolution {e.g., 25-10A to 
10-3.0 A and then gradually extended to include data from 12-6 A to 3.0-1.5 A. B-values (also termed temperature 
factors) for individual atoms can be refined once data of 2. 8 A or higher (e.g., up to 1 .5 A) has been added. Subsequently 
waters can be gradually added. A program such as ARP (Lamzin and Wilson. Acta Cryst. D49: 129-147 (1993)) can be 

30 used to add crystal lographic waters and as a tool to check for bad areas in the model. Programs such as PROCHECK 
(Lackowski et al. J- Appt. Cryst. 26:283-291 (1993)), WHAT1F (Vriend, J. Mol Graph <?:52-56 (1990)) and PROFILE 
3D (Ltlthy et aL, Nature 356:83-85 (1992)), as well as the geometrical analysis generated by X-PLOR can be been used 
to check the structure for errors. A program such as DSSP can be used to assign the secondary structure elements 
(Kabsch and Sander (1983), infra). 

35 The structure of a PPCA or pPPCA can thus be solved with the molecular replacement procedure such as by 

using X-PLOR (Brunger (1992), infra). A partial search model for the monomer can be constructed using a related 
protein, such as wheat serine carboxypeptidase structure (Liao et al. (1992), infra). The rotation and translation function 
can be solved to yield orientations and positions for the subunits in the crystal lographic asymmetric unit. This allows 
phases to be determined that, when combined with information from the x-ray diffraction panems. allows electron 

40 density maps of a PPCA or pPPCA to be calculated. The atomic model is then built using these electron density maps. 
Cyclical two-fold density averaging can also be done to improve the electron density maps using a suitable program 
(eg.. RAVE) and model expansion can also be used to add missing residues for each monomer, resulting in a model with 
95-99.9% of the total number residues. The model can be refined in a program such as X-PLOR (Brunger (1992), 
supra), to a suitable crystallographic R^,. The model data is then saved on computer readable media for use in further 

45 analysis, such as rational drug design. 
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Rational Design of Drugs that Interact with the PPCA or pPPCA 

The determination of the three-d.mensional structure of a PPCA or pPPCA. as described herein, provides a 
basis for the design of new and specific ligands for the diagnosis and/or treatment of at least one PPCA- or pPPCA- 
related pathology. 

Several approaches can be taken for the use of the crystal structure of a PPCA or pPPCA in the rational design 
of hgands of this protein. A computer-assisted, manual examination of the active site structure is optionally done The 
use of software such as GRID ( Goodford, J. Med Chem. 25:849-857 (1985)) a program that determines probable 
interaction sites between probes with various functional group characteristics and the enzyme surface — is used 
analyze the active site to determine structures of inhibiting compounds. The program calculations, with suiiabl 
inhibiting groups on molecules (e.g., protonated primary amines) as the probe, are used to identify potential hotspot 
around accessible positions at suitable energy contour levels. Suitable ligands. as inhibiting or stimulating modulating 
compounds or compositions, are then tested for modulating activities of at least one PPCA or pPPCA 

A diagnostic or therapeutic PPCA or pPPCA modulating ligand of the present invention can be. but is not 
limited to. at least one selected from a nucleic acid, a compound, a protein, an element, a lipid, an antibody, a saccharide, 
an isotope, a carbohydrate, an imaging agent, a lipoprotein, a glycoprotein, an enzyme, a detectable probe, and antibody 
or fragment thereof, or any combination thereof, which can be detectably labeled as for labeling antibodies. Such labels 
include, but are not limited to. enzymatic labels, radioisotope or radioactive compounds or elements, fluorescent 
compounds or metals, chem i luminescent compounds and bioluminescent compounds. Alternatively, any other known 
diagnostic or therapeutic agent can be used in a method of the invention. 

After preliminary experiments are done to determine the K m of the substrate with each enzyme activity of a 
PPCA or pPPCA. the time-dependent nature of modulation ofligand K. values are determined, (e.g., by the method of 
Henderson (Biochem. J. 127.32 1-333 (1 972)). For example, the substrate (or blank where appropriate) and enzyme 
are pre-.ncubated in buffer. Reactions are initiated by the addition of substrate. Aliquots are removed over a suitable 
time course and each quenched by addition into the aliquots of suitable quenching solution (e.g., sodium hydroxide in 
aqueous cthanol). The concentration of product is determined, e.g., fluorometrically, using a spectrometer . Plots of 
fluorescence against time can be close to linear over the assay period, and are used to obtain values for the initial velocity 
in the presence (V.) or absence (V 0 ) of ligand. Error is present in both axes in a Henderson plot, making it inappropriate 
for standard regression analysis (Leatherbarrow, Trends Biochem. Sci. 73:455-458 (1990)). Therefore, K, values are 
obtained from the data by fining to a modified version of the Henderson equation for competitive inhibit .on. 

Qr 2 + (£ - Q - /)r - £ = 0 
where (using the notation of Henderson (Biochem. J 127.12 1-333 (1 972)): 

V 



_ O 



This equation is solved for the positive root with the constraint that 

0=K,((A, + K,)/K.) 

using PROCNLIN from SAS (SAS Institute Inc.. Cary. North Carolina. USA i which performs nonlinear regression using 
least-square techniques. The iterative method used is optionally the multivariate secant method, similar to the Gauss- 
Newton method, except that the derivatives in the Taylor series are estimated from the histogram of iterations, rather than 
supplied analytically. A suitable convergence criterion is optionally used. e.g.. where there is a change in loss function 
of less than 10 V 
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Once modulating ligands are found and isolated or synthesized, crystallographic studies of the compounds 
complexed to a PPCA or pPPCA can be performed. As a non-limiting example, PPCA or pPPCA crystals are soaked 
for 2 days in 0.0!- 100 mM ligand and x-ray diffraction data are collected on an area detector and/or an image plate 
detector (eg., a Mar image plate detector) using a rotating anode x-ray source. Data are collected to as high a resolution 
5 as possible, e.g., an inner limit of diffraction of 1 .5-3. 5A. An atomic model of the inhibitor is built into the difference 
Fourier map (F miu1taatvom9 ^ -F^^). The model can be refined to adjust the atomic positions to improve the fit with the 
electron density maps, while maintaining correct stereochemical constraints. The model will preferably have low r.m.s. 
deviations from the ideal bond lengths, as well as for the angles, respectively, as well as a low R-factor (preferably less 
than about 25-35%, such as less than about 35, 34, 33, 32, 3 1, 30. 29, 28. 27, 26, or 25%. 

10 Direct measurements of enzyme inhibition provide further confirmation that the modeled ligands are 

modulators of at least one biological activity of a PPCA or a pPPCA . As a non-limiting example, a modification (Chong 
et a/., Biochim. Biophys. Acta 7077:65-71 (1991)) of the fluorometric assay of Potier (et a/.. Analyt. Biochem. 94:287- 
296 (1979)) is optionally used to measure neuraminidase inhibition or stimulation, optionally including determination 
of inhibition constants (K). Other suitable PPCA activity assay include, e.g., cathepsin A activity (Galjart et at.. J. Biol 

15 Chem. 266: 14754-14762 (1991); Endothelin I deamidase activity (Jackman, eta!.. J. Biol. Chem. 267:2872-2875(1992); 
and tachykinin deamidase activity (Jackman, et ai, J. Biol. Chem. 265: 1 1265-1 1272 (1990)). 

Ligands of a PPCA or pPPCA. based on the crystal structure of this enzyme, are thus also provided by the 
present invention. A PPCA or pPPCA ligand is any molecule, compound or composition that is capable of associating 
with a PPCA or pPPCA and optionally modulating at least one function or structural feature of a PPCA or pPPCA. 

20 Preferably, a PPCA or pPPCA ligand modulates at least one biological activity of a PPCA or pPPCA. Demonstration 
of clinically useful levels, e.g., in vivo activity is also important. In evaluating PPCA or pPPCA inhibitors for biological 
activity in animal models (e.g., rat, mouse, rabbit) using various oral and parenteral routes of administration are 
evaluated. Using this approach, it is expected that modulation of a PPCA or pPPCA occurs in suitable animal models, 
using the ligands discovered by structure determination and x-ray crystallography. 

25 Evaluation of Therapeutic Potentials of Compositions via a PPCA Animal Model 

The present invention also provides methods for identifying diagnostic or therapeutic ligands of PPCA or 
pPPCA via computer RDD, to treat a PPCA-related pathology. Generally, a method for determining the therapeutic or 
diagnostic use of a PPCA or pPPCA modulating ligand, to treat a PPCA related pathology, comprises the steps of 
administering a known dose of at least one ligand containing compositions to an animal model having a phenorype 

30 corresponding to a PPCA-related pathology, monitoring the appropriate biological or biochemical parameters, and 
comparing the results with treated animals to those of untreated animals. Results indicating the onset or presence of a 
PPCA related pathology are generally referred to herein as "symptoms" of the disease. See., e.g.. U.S. Appl. No. 
08/397,693, filed March 2, 1995, which is entirely incorporated herein by reference. 

Appropriate biological and biochemical parameters that reflect the onset and progression of a PPCA related 

35 pathology include, but are not limited to, (I) gross biological parameters, e.g., physical appearance (i.e., flattening of 
the face, rough haircoat and/or subcutaneous swelling in affected animals) or growth (reduced weight gain); (2) gross 
behavioral parameters, e.g., lack of coordination; (3) biochemical assays, e.g.. assays of cathepsin A, N-aceryl-o- 
neuraminidas? or (^-gaiactosidase activities in primary cultures of skin fibroblasts or tissue homogenates; (4) 
histopatholo i .'..il studies (visceromegaly, i.e.. enlarged liver and spleen: accumulation of secondary vacuoles in kidney 

40 tissues; etc.), 

A first method of evaluating the therapeutic potential of a composition using the transgenic non-human animals 
of the invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal havinc a 
phenotype corresponding to a human PPCA related pathology; 
45 (2) Detecting the time of onset of symptoms in the first non-human animal: and 
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(3) Comparing the time of onset of symptoms in the first non-human animal to the time of onset 
of symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition; 
wherein a statistically significant delay in the time of onset of symptoms in the first non-human animal relative to the 
time of onset of the symptoms in the second non-human animal indicates the potential of the composition for treating 
a PPCA related pathology. 

A second method of evaluating the therapeutic potential of a composition using the non-human animals of the 
invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal having a 
phenotype corresponding to a human PPCA related pathology at an initial time, t^ 

(2) Determining the extent of symptoms in the first non-human animal at a later time, t,; and 

(3) Comparing, at t,, the extent of symptoms in the first non-human animal to the extent of 
symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition at to, 

wherein a statistically significant decrease in the extent of symptoms at t, in the first non-human animal relative to the 
extent of the symptoms at t, in the second non-human animal indicates the potential of the composition for treating a 
PPCA related pathology. 

In the above methods, the composition being tested may comprise a chemical compound administered by 
circulatory injection or oral ingestion. The composition being evaluated may alternatively comprise a polypeptide 
administered by circulatory injection of an isolated or recombinant bacterium or virus that is live or attenuated, wherein 
the polypeptide is present on the surface of the bacterium or virus prior to injection, or a polypeptide administered by 
circulatory injection of an isolated or recombinant bacterium or virus capable of reproduction within a non-human 
animal, and the polypeptide is produced within a non-human animal by genetic expression of a DNA sequence encoding 
the polypeptide. Alternatively, the composition being evaluated may comprise one or more nucleic acios. including a 
gene from the human genome or a processed RNA transcript thereof. Similarly, the composition being evaluated may 
comprise cells removed from a mammal and genetically engineered to overexpress a lysosomal protein or some other 
therapeutic polypeptide. 

Once the PPCA modulating ligand has been shown to be effective in an animal model, it can then be tested in 
human clinical trials, according to known method steps. 

In the above methods, delivery of the composition being tested to non-human animals is achieved via means 
appropriate for the composition being tested, e.g., by diet; by intermittent or continuous intravenous injection of one or 
more of the compositions or of a liposome (Rahman and Schein, in Liposomes as Drug Carriers, Gregori&dis, ed., John 
Wiley, New York (1988), pages 381-400; Gabizon, A., in Drug Carrier Systems. Vol. 9, Roerdink et aL, eds., John 
Wiley, New York (1989), pages 185-212) or microparticle (Tice et aL, U.S. Patent 4.542,025 (Sep. 17, 1985)) 
formulation comprising one or more of the compositions; via subdermal implantation of drug-polymer conjugates 
(Duncan. R„ Anti-Cancer Drugs 3:175-210 (1992): via microparticle bombardment (Sanford et aL, U.S. Patent 
4.945,050 (Jul. 31, 1990)): via infusion pumps (Blackshear and Rohde, in Drug Carrier Systems, Vol. 9, Roerdink et 
aL, eds., John Wiley, New York (1989), pages 293-310) or by other appropriate means known in the art (see, generally, 
Remington's Pharmaceutical Sciences. 18th Ed.. Gennaro, ed., Mack Publishing Co.. Easton, PA (19901) 
Pharmaceutical/Diagnostic Administration 

Using compounds or compositions comprising at least one PPCA or PPCA modulating ligand. the present 
invention further provides a method for modulating the activity of a PPCA or pPPCA protein in a eel . In ceneral. 
ligands (antagonists or agonists) which have been identified to inhibit or enhance the activity of at least one PPCA or 
pPPCA ligand can be formulated so that the ligand can be contacted with a cell expressing at least one PPCA or pPPCA 
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protein in vivo. The coniaciing of such a cell with such a tigand results in the in vivo modulation of at least one 
biological activity of a PPCA or pPPCA. 

At least one PPCA or pPPCA modulating compound or composition of the invention can be administered by 
any means that achieve the intended purpose, using a suitable pharmaceutical composition or formulation. For example, 
administration can be by various parenteral routes such as subcutaneous, intravenous, intradermal, intramuscular, 
intraperitoneal, intranasal, intracranial, transdermal, or buccal routes. Alternatively, or concurrently, administration can 
be by the oral route. Parenteral administration can be by bolus injection or by gradual perfusion over time. 

A typical regimen for treatment or prophylaxis comprises administration of an effective amount over a period 
of one or several days, up to and including between one week and about six months. It is understood that the dosage 
of a diagnostic/pharmaceutical compound or composition of the invention administered in vivo or in vitro will be 
dependent upon the age, sex, health, and weight of the recipient, kind of concurrent treatment, if any, frequency of 
treatment, and the nature of the diagnostic/ pharmaceutical effect desired. The ranges of effective doses provided herein 
are not intended to be limiting and represent preferred dose ranges. However, the most preferred dosage will be tailored 
to the individual subject, as is understood and determinable by one skilled in the relevant arts. See, e.g., Berkow et a/., 
eds.. The Merck Manual, I6th edition, Merck and Co., Rahway, N.J.. 1992: Goodman et ai y eds., Goodman and 
Gilmans The Pharmacological Basis of Therapeutics. Sth edition. Pergamon Press, Inc., Etmsford, N.Y., (1990); Avery's 
Drug Treatment: Principles and Practice of Clinical Pharmacology and Therapeutics* 3rd edition. AD! S Press, LTD., 
Williams and Wilkins. Baltimore. MD. (1987). Ebadi. Pharmacology*. Little, Brown and Co.. Boston. (1985); Osol et al., 
eds.. Remington's Pharmaceutical Sciences, 18th edition. Mack Publishing Co.. Easton. PA (1990); Katzung. Basic and 
Clinical Pharmacology. Appleton and Lange. Norwalk, CT (1992). which references are entirely incorporated herein 
by reference. 

The total dose required for each treatment can be administered by multiple doses or in a single dose. The 
diagnostic/pharmaceutical compound or composition can be administered alone or in conjunction with other diagnostics 
and/or pharmaceuticals directed to the pathology, or directed to other symptoms of the pathology. Effective amounts 
of a diagnostic/pharmaceutical compound or composition of the invention are from about 0.1 |ig to about 100 mg/kg 
body weight, administered at intervals of 4-72 hours, for a period of 2 hours to 1 year, and/or any range or value therein. 

The recipients of administration of compounds and/or compositions of the invention can be any mammals. 
Among mammals, the preferred recipients are mammals of the Orders Primata (including humans, apes and monkeys), 
Artertodactyla (including horses, goats, cows, sheep, pigs), Rodenta (including mice, rats, rabbits, and hamsters), and 
Carnivora (including cats, and dogs). The most preferred recipients are humans. 

Having now generally described the invention, the same will be more readily understood through reference 
to the following example which is provided by way of illustration, and is not intended to be limiting of the present 
invention. 

Example J: Preparation, Purification and Crystallization of PPCA or pPPCA from Human 

Cells 

The present invention provides, in one aspect, the determination of the three-dimensional structure of the human 
protective protein/cathepsin A (PPCA) in the precursor form (pPPCA) by a combination of molecular replacement and 
twofold density averaging. The structure presented here is the first of an enzyme associated with a human PPCA related 
pathology, and the third human lysosomal enzyme structure determined. The structure gives us insight into the zymogen 
activation mechanism of pPPCA , as well as the expected 3-D structure of PPCA and its specific and new enzymatic 
activities. 

PPCA and pPPCA Expression and Purification 

Plasmid Constructs. AcMNPV transfer-plasm ids pJR2 and pBC3 (Figure I) were derivatives of plasmid 
pAc373. carrying the entire polyhedrin gene (Smith et aL % 1985). In pJR2 a polylinker with a number of multiple 
cloning sues (MCS) was inserted directly 3' of the polyhedrin promoter, and substituted a 33-nucleotide deletion of the 
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polyhcdrin gene, staning with the ATG. pBC3 had the polylinker s.tuated in a similar position as pJR2. bu, instead of 
the 33-nt delet.on th.s plasmid featured an ATG codon mutated in ACG. Full-length human PPCA cDNA PPCA 54 
(Galjan a at.. 1988). and the two deletion cDNA mutants. 32U20) and 20U32) (Galjan « at.. 1991 ). were subcloned 
either in pJR2 or pBC3 as EcoRI fragments, using standard procedures (Sambrook et at.. 1989). (Figure I) The 
20U32) deletion mutant was tagged with the human PPCA signal sequence, as reported earlier (Galjan et at.. 1991 ). 
All cDNA fragments were engineered to have short 3' and 5" untranslated regions (< 10 bp). 

Transfection and Selection of Recombinant Bacuiovirus. Spodoptera frugiperda insect cells (IPLB-SF2 1 ) 
were cultured in monolayers at 27'C in TNM-FH medium (Hink. 1970). supplemented with 10% FBS and antibiotics 
(complete medium). Wild-type (wt) AcMNPV virus strain E2 (Smith and Summers. 1978) and recombinant 
baculoviruses were propagated on confluent monolayers of Sf21 cells. Recombinant constructs AcPPCA54. AcPPCA3"> 
and AcPPCA20 were generated by cotransfecting Sf2l cells with I ug wt-AcMNPV DNA and 10 ug plasmid DNa" 
using the calcium phosphate method, modified for insect cells (Graham ei at.. 1 973; Carstens e, at . 1980; Summers e, 
at.. 1987). Recombinant polyhedrin-negative recombinant baculoviruses were then selected and purified by sequential 
plaque assays, and verified by dot blot and southern blot analysis (Summers et at.. 1987). Large quantities of inoculum 
were produced by infection of insect cells at 25-50 % confiuency. with recombinant virus at a multiplicity of infection 
(MOI) of < I pfu/cell. After 3 to 6 days at 27'C. when all cells appeared infected, the medium was harvested and 
cenmfuged for 5 m at 1000 rpm to remove detached cells. The litre of «he inoculum was determined bv plaque assav 
analysis. 

Protein purification and western blotting. Sf2! cells were cultured in either 1 75 CM ; or 500 CM 1 flasks (triple 
flask. Nunc) to near confiuency. and infected with recombinant baculoviruses at a MOI of 5- 1 0 pfu/cell After 1 .5 h 
incubation at 27 «C. the inoculum was replaced with complete medium for additional 8 to 10 hrs. Cell monolayers were 
then rinsed with PBS and cultured further for 38 h in unsupplemented Grace's medium. After infection the medium was 
collected, centrifiiged for 5 m at 1500 g, and for 1 h at 100.000 g (Beckmann SW-28 rotor) to remove virus particles. 
After centrifugation the supernatant was concentrated 20-fold, in an Amicon stirred cell. Glycoproteins were purified 
-60% using a concanavalin A-SEPH AROSE affinity chromatography column, as described earlier (Verieijen et at., 
1 982). Total protein concentration was measured using the method of Smith et at., (1 985). Aliquots of the purified 
preparation were resolved on 12.5% SDS-polyacrylamide gels under reducing and non-reducing conditions. Gels were 
either Coomassie brilliant blue- or silver stained (Sambrook et al. . 1 989). For western blotting, proteins were transferred 
from gels to IMMOBILON PVDV membranes (Millipore Corp.). using a semidry blotter (The W.E.P. company). 

Development and UseofpPPCA antibodies. A 15 amino acid peptide (NH,-Cys-Met-Trp-His-Gln-Ala-Leu- 
Leu-Arg-Ser-Glu-Asp-Lys-Ala-Arg-COOH) (Figure 5). based on the C-terminal sequence of the 34-kDa PPCA subunit 
(amino acid 285-298. Galjan et al., 1988), was synthesized on a peptide synthesizer (Applied Biosystems). and 
covalently linked to the carrier protein Keyhole Limpet Hemocyanin, using the IMJECT ACTIVATED IMMUNOGEN 
CONJUGATION KIT (Pierce). Polyclonal antibodies against the conjugated product were raised in rabbit, by multiple 
subdermal injections of the protein (40- 1 25 ug) mixed with incomplete Freunds adjuvant (Pierce). Rabbits were bled 
34 days after the first injection. The antibodies, designated anti-pep, were tested on immunobiots and by 
immunoprecipitations of bacuiovirus produced PPCA. 

Blot* were incubated for at least 12 h in blocking buffer (0.01 M tris-buffered saline pH 8.0 (TBS). 0.05% 
Tween20. ai.' ,%(w/v BSA). and subsequently probed for 2 h with polyclonal PPCA antibodies, anti-54. d luted 1:200 
in fresh blocking buffer. They were then washed for I h in TBS. 0.05% Tween 20. and incubated for 2 h with alkaline 
phosphatase conjugate anti-rabbit IgG (Sigma. 1:1000 in blocking buffer). Proteins were visualized using alkaline 
phosphatase substrate (Sigma. 4-aminodiphenylamine diazonium sulfate, naphtol as-mx phosphate). 

Crystallization of PPCA. Fractions containing the precursor form of the protein as assayed on an SDS-PAGE 
gel were pooled. Subsequently the protein was concentrated to 5 mg/ml and the buffer exchanged to 50 mM NaAc pH 
5.2 or 50 mM MES pH 6.5 using a CENTRICON-I 0. Crystals were crown using the hanging dr~ vapor diffusion 
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technique. Crystals suitable for data collection were grown using a reservoir solution containing : 2-10 % PEG 8000 
PH 8.0 - 8.3. 50mM TRIZMA. ImM NaN„ 0.25 % (J-ocryl glucoside at4-| 2 'C. Mixing non-equal volumes of protein 
solution, (in the range 5-lOul) and reservoir solution ( in the range 2-6 W) enhanced the occurrence of single large 
crystals per drop under these crystallization conditions. The concentration of the protein solution before mixing was 
5 mg/ml. Crystal growth was enhanced by macrocrystallization techniques (anything that promotes growth of big 
crystals) and in some cases by micro- and macroseeding techniques. 

Example 2: Structure Determination of a pPPCA Crystallized from Human Cells 
Data Collection. Data Processing and Reduction. 

To allow for data collection at cryotemperatures. the crystals were cryoprotected by adding glycerol in 5% -10% 
steps to a solution of about 12% PEG 8000. 50 mM TRIZMA. pH 8.0. ImM NaN,, 0.25% JJ-octyl glucoside which 
served as an artificial mother liquor. The crystals were incubated for half an hour at 40«C after each addition of 
glycerol. The final mother liquor contained 30% glycerol. Gradually increasing the glycerol was needed to help keen 
the crystals from cracking. 

Diffraction data was collected at the Stanford Synchrotron Radiation Laboratories (SSRL) to 2 0 A at -178 «C 
on a MAR .maging plate at a wavelength of 1 .08 A on beam-line 7-1. The diffraction coordinate data (corresponding 
to atormc coordinates monomer I. the other monomer coord.nates are provided by matrix conversion of these 
coordmates. as presented herein) was processed and reduced us.ng MOSFLM version 5.2 from the CCP4 program 
package (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory UK. 1979). The program REFIX 
(Kabsch (1993). infra) was used for auto-indexing. Using the CCP4 program suite (SERC (UK) Collaborative 
Computrng Project 4, Daresbury Laboratory UK, .979). the intensities were scaled (ROTAVATA). merged 
(AGROVATA) then converted to amplitudes and truncated with the program TRUNCATE. Statistics of the data 
collected are given in Table 1 The V m (Manhews. B.W., J. Mot B,ot. 33:49 1-497 (1 968)) is 3.2 A'/Da for 2 monomers 
in the asymmetric unit, corresponding to a solvent content of 62%. 
Molecular Replacement 

Search Model: The best molecular replacement results were obtained using a multi-Ala core as a search probe 
The multi-Alt core' search model was constructed from the atomic coordinates of the CPW monomer (Liao e. al 1992) 
ba*d on the sequence alignment as presented in Figure 1 5. Regions expected to deviate in structure between PPCA and 

PpT! [^Z tUT ^ m0de ' ° " ' OW SeqUenCe idmtity ° r ^ ' OOPS) - The 125 r « idu « ^emica. in 
PPCA and CPW were left in the model; 1 12 residues were truncated to alanine. The remaining 94 residues through 

d!ffenng between CPW and PPCA. were considered sufficiently s.mi.ar in size and the CPW residue left as such in the 
model. The resulting 'multi-Ala core' monomer consisted of 33 1 residues, constituting a large portion of the core domain 
and httle atom.c information for the 'cap' domain (see Figure 1). The model contained 30% of the expected protein 
scattermg mass give* the fact that there are two monomers in the asymmetnc unit. The sequence identity between this 
search model and the true PPCA structure was 37.7%. 

Rotation Function. PC Refinement and Translation Function: Native data of 8 - 4A was used in the 
molecular replacement calculations. The rotational searches utilized a real space Patterson search method as 
rmplemented ,n X-PLOR (Steigeman. .974: Huber, 1985. Brunger .992a) with a Patterson vectorcutoff of 21 A The 
se.f-rotat.on function failed to reveal any non-crysta.lographic two-fold symmetry relating two monomers in the 
asymmernc un,t. .n addition, the nauve self Pattersons did no, reveal the presence of a non-crysta.lographic two-fo.d 
ax.s paraHe. to a crysta.lographic axis. These results indicated that the two monomers in the asymmetric unit mi-h, no, 
form a d.mer together. The cross-rotation function was carried to find the orientation of the rwo monomers^ the 
asymmetric urn, as follows. Patterson vector sets were ca.cu.ated for the search mode, and the native data and the 8000 
sponges, Panerson vectors were used in the rotation function. The rotational space restricted to the asvmmetric unit of 
the rotanon function according ,o Rao e, al.. .980. was samp.ed by rotating the Patterson vectors from the search mode, 
around Eu.erian angles 61 . 62. and 83. while sampling 62 in angular grid intervals of 2.5-. The 5000 highest rotation 
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function grid points were selected resulting from the product function of the two Patterson vector sets. The grid points 
(differing less than 8° around any given axis) were then clustered. The result was a list of 169 possible solutions for 
the rotation function, each corresponding to a set of three angles describing an orientation. The two top solutions were 
3 .9 and 3.8 sigma above the mean. PC-refinement (Brunger, 1990) was carried out to optimize each of the 1 69 possible 
solutions using the complete search model as a single rigid body. This yielded two orientations with a PC-index of 0.043 
and 0.051 respectively. The orientations of these solutions were (D t = 261.4, D 2 = 36.22, D 3 = 147.28): and (Z>, = 
18.52. D z = 47.40. D 3 * 23.22). respectively. In contrast, the rest of the possible solutions yielded an average PC-index 
of 0.022. 

Individual translation function calculations were performed on a 1 A grid. A translational solution was found 
for each orientation at positions (x-33.30, y-51.97, and z=I2.79) and (x=25J23, y=28.58, and z=22.02;, with respect 
to the crystal lographic center, as 7.7 and 8.8o, respectively, above the mean. The for the individual solutions was 
55.6% and 54.8% in the resolution range 8.0 to 4.0A, with a correlation coefficient (CC) of 0.095 and 0.1 14. A 
combined translation function was calculated to place each solution relative to the same crystal lographic origin, resulting 
in an R f „ of 52.8% for data between 8.0 and 4.0A. bringing the R r ^ tor down to 51.3% and increasing the CC to 0.22. 
15 The molecular packing was assessed on a graphics workstation, which revealed no clashes between the placed search 
probes. However, a very large amount of empty space was present. The packing showed that the asvmmetric unit 
contained two half dimers, each forming a dimer with another monomer in a neighboring unit cell. The two cores in 
the asymmetric unit were related by K=73° around an axis tilted 15.5° off the crystal lographic a axis lying in the ax 
plane. 

20 Iterative Mode! Building and Two-fold A ver aging 

initial Electron Density Map: A 2m|F obl | -DjF^I SigmaA weighted map (Read, 1986) was calculated using 
IF^J's and phases from the molecular replacement solution. The map was contoured at lo and showed good density 
for most of the core. Density emerged for many side chains where the input model residue had been an Ala, indicating 
that the molecular replacement solution was correct. 
25 First Model Built: The two rotated and translated search probes formed the starting point for model building 

of the PPCA precursor. The non-crystal lographic symmetry (NCS) matrix was determined between the two cores using 
the "Lsq_explicir option in the computer program O (Jones et al, 1991 ). Subsequently a 'best monomer* was built by 
superimposing the electron densities from each monomer core, and adjusting the model accordingly. Residues were only 
incorporated in the model where the electron density was visible for the complete side chain. Residues from the search 
model for which no density was visible were removed. An alanine was built in the model at places where electron 
density for a side chain was partial. In this manner 294 residues, i.e. 65% of the C* atoms were built in the 'best 
monomer' core. The second monomer was generated from the 'best monomer' model using the NCS operator relating 
the two monomers in the asymmetric unit. At this point the data set was partitioned in a working set and a test set 
consisting of 5% of the reflections between 8 - 2.2 A to monitor the R ffet (BrUnger et al. 1992b). The working data set 
35 was used for rigid body and positional refinement. For averaging and map calculations the unpartitioned data set was 
used. Twenty-five cycles of refinement using the two 'best monomers cores' positioned in the asymmetric unit as rigid 
bodies and data from 8.0 - 3.0A, resulted in an R^,, of 53.5% for this resolution range. The atomic coordinates of this 
partial model were used to calculate a new 2m [F^l - DIF^I SigmaA weighted map which we called the 'best monomer 
map'. 

40 Averaging: Search for Missing Density: The phasing power from the rigid body refined 'best monomer 

cores', consisting of 294 residues per core was insufficient to bring back interpretable electron density for the missing 
pan of the model. 158 residues per monomer. To overcome this a 'bootstrapping' procedure was appiied. entailing 
density averaging using RAVE (Kleywegt & Jones. 1994a) and model expansion. The 'best monomer map* and the rigid 
body refined 'best monomer cores* served as the starting point for this procedure. 
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Six bootstrapping cycles were carried out, called bmel through bmc6. allowing for the model to be extended 
in stepwise increments. Figure 16 shows a scheme of the steps incorporated in one bootstrapping cycle. After a cycle 
in which the model had undergone major expansion, a new molecular mask was calculated with MAMA (Kleywegt & 
Jones. 1994b) for use in the subsequent bootstrapping cycle. No phase recombination was applied between 
bootstrapping cycles. At the end of each cycle the inverted phases and inverted amplitudes F^, 's were discarded. 
The NCS operator was re-optimized after cycle bmc3. The resolution range of the data included in the bootstrapping 
cycle started with 15 - 3.0 A for bmel and was gradually extended to 15 - 2.7 A in bmc6. The bootstrapping procedure 
is summarized in Table 2. To optimize the bootstrapping procedure, consideration was given to the molecular mask used 
in the averaging, the model building strategy and the refinement procedure. 

Molecular masks: Four different masks were constructed in total. The atomic radius of all atoms was set to 
4 A to calculate each mask. The masks were then manually modified using mask editing options in O (Jones et ai 1991). 
Maskl, was constructed around the 'best monomer core*. Subsequently it was greatly enlarged by multiple blocks of 
10 - 15 A 3 in the regions where the model was incomplete (Figure 17). This was crucial to prevent the density in the 
insertion area's from being flattened during the averaging step. Approximately one half of the dimer interface was 
estimated to be formed by regions from the missing cap domain. Major expansions of the mask in this area were made 
to accommodate for this. This resulted in a serious overlap problem when the mask was duplicated to cover a complete 
dimer. The mask was reduced where overlap occurred with the "overlap_trim" option of MAMA. After several 
bootstrapping cycles, new incorporated polypeptide fragments were carefully assigned to one of the two monomers 
forming the dimer and the mask at the dimer interface area's was manually adjusted accordingly. Essentially the masks 
were kept far too large in regions where the model was missing in order to avoid erroneous flattening of electron density 
In contrast the masks were tightened around the area's of the molecule where the model was complete. 

Model Building: A conservative model building strategy was adopted. Initially only side chains were mutated 
in the core region to fit the PPCA amino acid sequence and where the density was clear, poly-alanine fragments were 
built in the insertion area's (loops and the cap domain). Newly included atoms were given a B-factor of 20 K 1 . Only 
once models bmc5 and bmc6 were obtained, was the electron density of sufficient quality to allow side chains to be 
incorporated confidently in the cap domain (residues 190 - 303). At this stage the C* trace was virtually complete for 
the whole dimer and the sequence could be fit unambiguously. 

* 

Refinement: Positional refinement was postponed until after 3 cycles of bootstrapping resulting in a final 
model containing 91% of the C* atoms. Forty steps of positional refinement were then carried out to improve the 
geometry of the model. Subsequently only one of the refined monomer was taken and the other generated using NCS 
operators. The rational for delaying the positional refinement is addressed in the discussion. 

Completing the model: deviations from two-fold symmetry. It was possible to add 148 residues and 1 85 side 
chains per monomer after a total of 6 bootstrapping cycles. At this stage, each subunit contained 442 residues and 4 1 3 
side chains, i.e. 98% of the C" and 91% of the side chains atoms. The gradual model expansion as a function of the 
bootstrapping cycle is shown in Figure 18. 

Twenty residues were still missing in the asymmetric unit at this stage. These were localized to two stretches 
per monomer (260 - 262 and 287-292). With most of the scattering mass incorporated, the monomers from model bmc6 
was refined individually with X-PLOR (Brunger, 1992a) in an attempt to retrieve electron density for the still missing 
residues. After 40 steps of positional refinement using data from 8.0 - 2.6 A. the R^*. dropped significantly from 40.2% 
to 33.2%. The model was further positionally refined using a full weight on the crystallographic term. The data 
included in the refinement was gradually extended to 2.2 A. At 2.4 A resolution individual B-factors were refined and 
the distribution checked as a function of atom location (i.e., low B-factors in the core and high B-factors on the surface). 
Cycles of refinement and refining allowed for 18 missing residues to be added. Essentially almost the complete cap 
domain was retrieved using the bootstrapping procedure, as shown in Figure 19. It became apparent from the refined 
maps that the two stretches of missing amino acids adopted a very different conformation in the two monomers (with 
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as much as an average r.m.s.d. of 7.9 A for the C-s of residues 287 - 292). For this reason electron density for these 
reg,ons had not been retrieved in the two-fold averaging process. The stepwise improvement of the electron density 
maps along with averaging, model expansion and refinement is shown in Figure 6. 

The program ARP was used to check our model, in particular the region at the dimer interface (Lamzin & 
5 Wilson. 1993). Prior to the final round of positional refinement, an IF^I/o cutoff was applied to re;ect 10% of the 
weakest data as well as an anisotropic scale factor to offset the decreased resolution along the crystal lographic a axis 
The final model is of good geometry with a final R,.^ of 21.3% (R f „ 0 f 26.8 %) for data between 8.0 and 2.2 A (see 
Table 3). A Ramachandran plot is given in Figure 21 The r.m.s. coordinate error is 0.282 as calculated by SigmaA 
(Read. 1986). The average phase difference between the initial molecular replacement model and the currently refined 
10 model is calculated to be 71° for data between 10 - 2.2 A. 

The structure determination of PPCA is special in that two-fold averaging could be applied to refine very poor 
molecular replacement phases, enabling us to retrieve electron density for 148 residues and 185 side chains per 
monomer. In total 3 14 complete residues were added per asymmetric unit, equivalent to about 35 kDa of protein. In 
retrospect we feel that a number of factors contributed to a successful structure determination. 

Crystal Packing. Each monomer in the crystal is interacting with four non-crystallograpiiically related 
monomers. By far the most extensive contact is with a non-crystallographically related monomer generating the 
physiological dimer. Three additional contacts are extensive crystal contacts ranging from 200-800 A' averaged per 
monomer. The largest nondimer crystal contact involves the precursor loops from two crystallographically independent 
monomers < region 265-267. 28 1 -295 from monomer I with residues 28 1 -293 from monomer 2) making irtimate contact 
with each other. Summed together these loops create an intermolecuiar buried surface of 1680 A J . We believe that this 
stabilizes an otherwise very flexible area, possibly explaining the good diffraction qualities of the P2,2,2 crystals. 

It is also in this crystal contact that we find deviating spacial conformation and secondary structure between 
the two monomers as mentioned before. The electron density in this region is of very good quality with average 
temperature factors of 16.6 A ! for main chain and 1 8.3 A 1 for side chains. 

pPPCA and tl,e Hydrolase Family. The fold of pPPCA belongs to the large hydrolase fold fam ly containing 
enzymes such as the serine carboxypeptidases, dehalogenase, various lipases and acetylcholine esterase (Ollis et al. 
(1992). infra), having various different catalytic functions. Though the central core is the same (a central P-sheet flanked 
by e-helices on both sides) the proteins in this family all seem to have different 'cap' domains, both with rsspect to fold 
as well as size (Figure 7A-F). pPPCA has one of the largest cap domains comprising 121 residues forming the three 
30 helical bundle of the helical subdomain and a three stranded (i-sheet of the maturation subdomain. 

Major Differences and Comparison With the Serine Carboxypeptidases. The overall fold of the pPPCA 
monomer is similar to that of the wheat and yeast serine carboxypeptidases (Endrizzi et al. (1994), infra; Ollis et al 
( 1 992), infra). The complete core domains of pPPCA and CPW superimpose with an r.m.s. deviation of 1 .7 A for 302 
Ca atoms and 38% sequence identity. Deleting major deviating loops from the core domain allows for pPPCA to 
superimpose with an r.m.s. deviation of 1 .2 A onto CPW and CPY (293 equivalent C-s with 40 % sequence identity for 
CPW/pPPCA and 27 1 equivalent C"s for CPY/pPPCA with 42.2% identity). 

The cap domain in pPPCA differs significantly from the CPW and CPY counterparts. The pPPCA structure 
reveals a large maturation subdomain not present in the structure of CPW and CPY for which the structures of the 
enzymatically active forms are known. All three enzymes contain a 3 helical bundle in the cap domain. The sequence 
identity between the three proteins in this region is very low (ca. 12 %). In contrast. PPCA shows a much greater 
deviation. Hoi superimposes reasonably well with the CPW counterpart maintaining the same general orientation with 
respect to the core domain (requiring a rotation of only 7.4"). But helices Ha2 and Ha3 have undergone major rotations 
with respect to Hal and the core domains by k = 28.5° and k = 93.4°. respectively (Figure 8A). 

Due to the integral role of the cap domain in forming the dimer interface, the dimers of PPCA and CPW were 
compared. In the pPPCA and CPW dimers the monomers are oriented differently with respect to each other. 
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Superposition of the core domain of one monomer from each dimer shows that the second pair of monomers (forming 
the respective dimers) differ by a remarkable 15° in orientation (Figure 8B). Thus, it appears that the extensive 
differences in the cap domains lead to a different arrangement of the subunits in the dimers of PPCA and CPW. 

Catalytic Triad and Enzymatic Mechanism. Our structure shows that the precursor PPCA has all the elements 
proposed for the enzymatic machinery of the serine carboxypeptidase family (Liao et at. (1992), infra; Endrizzi et at. 
(1994), infra), and is now discovered to be the third structure elucidated belonging to this family of enzymes after CPW 
and CPY. The catalytic triad in the active site of pPPCA is formed by residues Ser 150, His 429 and Asp 372. The O t 
of Ser 150 forms a good hydrogen bond with the NM of His 429 with a N to O distance of 2.8 A. The N*l of His 429 
is 2.7 A removed from the 0*2 and 3.3 A from the 0*1 of Asp 372. Further, two backbone amides appear to orient the 
carboxylaie group of Asp 372. The N of Ala 374 is at a distance of 3.0 A to the O* 1 of Asp 372 and the N of Cys 375 
is at a distance of 2.9 A to the 0* 2 of Asp 372. 

The oxyanion hole proposed to stabilize the negatively charged tetrahedral intermediate in serine 
carboxypeptidases is formed by the backbone amides of Gly 57 and Tyr 15 1 in PPCA. The 32 atoms of the catalytic 
triad residues plus the oxyanion hole amides from PPCA, CPY and CPW superimpose with an r.m.s. deviation of 0.4 
15 A indicating the very high degree of structural similarity of the active site in the PPCA precursor with those in the fully 
active enzymes CPY and CPW, (see Table 4). The carboxylate of Asp 372 and the imidazole of His 429 in PPCA are 
non-planar, making an angle of approximately 60° between the imidazole and the carboxylate. A similar non-planarity 
has been observed in CPW and CPY, in contrast to the planar orientation found in subtilisin-.and trypsin-type serine 
proteases (McPhalen et at.. Biochemistry 2 7:6582-6598 (1988)). 
20 In pPPCA. a pair of glutamic acid residues (Glu 69 and Glu 149) is positioned near the catalytic triad, with their 

carboxylate groups interacting with each other. The carboxylate groups are located at approximately 8 A from the 0' 
of Ser 150, and lie at the bottom of the active site. An asparagine (Asn 55) is orientated such that it forms a hydrogen 
bond to each of the two carboxylate groups of the glutamic acid pair, at an N M (Asn) to O'/O 2 (Glu) distance of 3.0 and 
3.6 A, respectively. In addition the two carboxylates interact with each other via hydrogen bonds. This configuration 
25 of two glutamic acid residues and an asparagine, is conserved between pPPCA, CPW and CPY (see Table 4), and has 
been implicated in regulating the low pH optimum for the carboxypeptidase activity found in the serine 
carboxypeptidases (Liao et ai ( 1 992), infra). Biochemical data has suggested that a functional group with an apparent 
pK, value of pH 5.5, functions to bind the C-terminal carboxylate group of peptide substrates and is responsible for the 
observed pH optimum of 5.5 (reviewed in Breddam et at (1986), infra. Rawiings & Barrett (1994), infra). Together 
with their structural data, Liao and colleagues (Liao et at (1992). infra) have suggested that at pH 5.5 or below, one or 
both glutamates must be uncharged, while at a pH higher than 5.5 one or both of the carboxylates which are orientated 
opposite to each other, may become deprotonated resulting in unfavorable electrostatic interactions. This would disturb 
the hydrogen bonding pattern or result in structural perturbations causing the observed increase in K m for peptide 
substrates at high pH. In pPPCA the orientation of this pair of glutamic acids as well as that of the asparagine is 
35 essentially identical in structure to the equivalent residues in CPW and CPY (see Table 4), even though the structure has 
been determined at pH 8. The CPW and CPY structures have been determined at pH 5.7 and at pH 6.5-7.0. Thus, our 
structure appears to rule out large pH induced conformational changes of these three residues at least up to a pH value 
2.5 units above that optimal for carboxypeptidase activity However the high degree of conservation of these residues 
does indicate some role in a characteristic shared by all three enzymes. 

From our comparison it is clear that the enzymat.c machinery m the PPCA precursor form is in a conformation 
virtually identical to that found in the fully active CPW and CPY enzymes. On this basis, the conformation of the 
enzymatic machinery found in pPPCA is expected to faithfully represent the conformation that will be found in the 
active PPCA. 
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A&iveSite, Subsrrare Specificity. PPCA has a substrate preference for hydrophobic residues in the PI and/or 
PP binding pockets (Jackman et aL. Hypertension 2/:925-928 (1993)). In CPW the PI' pocket was identified to consist 
of two tyrosine residues (Tyr 60 and Tyr 239) which form a long channel, capped by two acidic residues (Glu 272 and 
Glu 398) at the end (Liao et at. ( 1 992), infra). This explains the highest preference of this enzyme for Arg and Lys as 
5 the leaving group (Breddam et aL. Carisberg Res. Commun. 32:297-3 1 1 (1987)). In CPY a similarly shaped pocket is 
formed by the residues Thr 60, Tyr 256, Leu 272 and Met 398 (Endrizzi et aL (1994), infra). !n PPCA the analogous 
residues are Tyr 247 and Asp 64 t forming the sides of the pocket with at the far end Met 430 and Thr 304. This is 
reasonably consistent with an overall preference of PPCA for a hydrophobic leaving group. 

In activation Mechanism of the Precursor Form. During the maturation step of the PPCA precursor form, at 
10 maximum residues 285-298 forming the 'excision' peptide, are removed by an as yet unidentified proteaje(s). in vitro. 
the maturation event can be mimicked by digestion with trypsin utilizing probably positions Arg 284. as well as Arg 292 
and/or Arg 298. The residues forming the 'excision* peptide adopt distinctly different conformations in the two 
cry st alio graphically distinct monomers forming the PPCA dimer in our crystal structure. Yet in both monomers this 
polypeptide region extends out from the protein surface and is virtually completely solvent and protease accessible 
15 (Figure 9). Arg 284 and Arg 292 are particularly well exposed. The main chain atoms of Arg 298 are less accessible, 
being sandwiched between the strand Mp2 and a loop N-terminal to helix Cct6, while a salt bridge with Glu 264 renders 
the side chain atoms of Arg 298 partially solvent inaccessible. 

The active site cleft is blocked by numerous residues from the maturation subdomain in the precursor form of 
PPCA. The catalytic triad is rendered solvent inaccessible by residues Asn 275, lie 276 and Phe 277. These residues 
20 are pan of the polypeptide Asp 272-Phe 277 which we call the 'blocking* peptide. This peptide is held down 
predominantly by hydrophobic contacts of Leu 273, lie 276, and Phe 277 to the core domain residues Gly 57, Cys 60, 
Leu 180, Leu 190, Val 191, Leu 232, Val 235, lie 246, Leu 280, Leu 282, Met 299 and Ala 373 (Fig 10). In addition 
residue Asn 275 of the blocking peptide appears to fill what might be part of the PI binding pocket in the mature form. 
Further inspection of the blocking peptide suggests that Gly 274 with Ramachandran angles $ - 66° and <j> = 28°, might 
25 play a central role in the strand blocking the active site. A glycine at this position appears critical to allow the 
polypeptide chain to adopt a conformation with its main chain at a safe distance from the catalytic triad. This might aid 
in allowing the blocking peptide to assume a conformation resistant to autocatalysis. The PI ' binding pocket seems to 
be beautifully filled by Pro 301 interacting with Thr 304, Tyr 247, Cys 60 and Cys 334. Thus substrate binding is not 
possible in the precursor form due to the inaccessibility of the substrate binding pockets. 
30 We conclude that the inactivation mechanism of PPCA is based on blocking of the active site, and not upon 

changes in the position of functional groups involved in catalysis/transition state stabilization. Both the PI, P2 and PI' 
binding pockets are rendered solvent inaccessible. The function of the blocking peptide seems to be to render the 
catalytic triad as well as the region around the PI and P2 binding pockets solvent inaccessible. The blocking peptide, 
however, does not assume a conformation that a peptide substrate would adopt. It is carefully positioned in a manner 
35 which is different from that of a productive substrate, thereby avoiding being by the nearby catalytic residues which 
are correctly poised for catalysis. A crucial observation is that the excision peptide itself does not bind in the active site 
cleft. Hence, mere removal of the excision peptide alone is not sufficient to allow solvent or substrate access to the 
active site. 

Proposed Maturation Event and Extent of Conformational Rearrangement. The active site of the precursor 
40 of PPCA appears to be fully blocked by 49 residues of the maturation subdomain. as shown in Figure 1 1 . Based on the 
precursor structure and the comparison with CPW and CPY it is proposed that a region comprising approximately 
residues 254-284 rearranges to free the PI. P2 binding sites, while the residues 299-302 rearrange to free the PI' binding 
pocket. The linker connecting these two segments of polypeptide chain is the 14 amino acid excision peptide Met 285- 
Arg 298. The extent of the residues rearranging is likely to be limited by a disulfide bridge Cys 253 and Cys 303, which 
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is conserved in the serine carboxypeptidase family. This critical disulfide serves to keep the secondary structure 
elements together at the /ar end of the PI' pocket. 

An interesting pair of salt bridges is observed between Arg 262, Asp 300, Glu 264 and Arg 298, four residues 
located on strands Mpl and Mp3 of the mixed p-sheet found in the maturation subdomain. This cluster of residues is 
strategically positioned at the base of the excision peptide, close the core domain and 'shielding 1 the mixed £-sheet via 
side chain interactions (see Figure 1 1). These residues are strictly conserved among the human, mouse and chicken 
PPCAs (Galjart et ai. (1991), infra). This charge cluster may be effected by a shift from neutral to acidic pH. Arrival 
in the endosome/lysosome is expected to result in protonation of either the Asp or the Glu residue or both, resulting in 
unfavorable electrostatic interactions and destabilization of this charge cluster. This in turn is expected to promote partial 
unfolding of maturation subdomain, allowing easier access to additional potential cleavage sites, and stimulating removal 
of the 'blocking* peptide which fills the active site in the precursor. 

A similar double salt bridge has been observed in the as panic proteinase zymogen pepsinogen between the 
proenzyme segment (Arg 8P) and the enzyme (Arg 308, Glu 13, Asp 304). 

The maturation mechanism for pPPCA appears to be novel among proteases for which the three-dimensional 
structure of the zymogen is known. The catalytic triad in the precursor form is in a catalytically competent 
conformation. Enzymatic activity is prevented by a 4 blocking* peptide. The blocking peptide is however different from 
the excision peptide and does not get excised from the mature enzyme. This leads to the distinct difference with the 
other known maturation mechanisms in that, after disappearance of the excision peptide, up to 35 residues filling the 
active site cleft in the PPCA precursor must rearrange to render the catalytic triad solvent accessible (see Figure 12), but 
do not get cleaved off. Removal of the excision peptide, and possibly a shift to lower pH in the endosome/lysosome, 
appears to be a trigger for this event. The mechanism does not appear to be autocatalytic, as uptake experiments with 
cultured galactosialidosis fibroblasts, have shown that a mutant PPCA with the catalytic Ser 150 mutated to Ala, is 
properly targeted and processed. It retains its protective function and except for the loss of catalytic activity is 
biochemically indistinguishable from the wild type enzyme (Galjart et ai (1991), infra). Surprisingly, the maturation 
mechanism of the serine carboxypeptidases PPCA, CPW and CPY may all differ from each other as well. This is 
clearest for CPY, in which a 91 residue polypeptide is cleaved off N-terminally to convert the zymogen to an active 
enzyme (Winther and Sorensen, Proc. NatL Acad. Sci USA 55:9330-9334 (1991)), as opposed to the excision of a 
peptide from within the zymogen generating a two chain active form as is the case for PPCA and CPW. 

Looking at the hydrolase fold family, the catalytic triad is housed in the core domain and the various cap 
domains attenuate the biological function by influencing entirely different properties such as: (I) enzyme kinetics 
exemplified by the interfacial activation of lipases (Smith et ai., Curr. Opinion in Structural Biology 2:490-496 ( 1 992)); 
(ii) substrate channeling as is proposed for acetylcholine esterase (Sussman et ai (1991), infra); (iii) substrate 
recognition, proposed for dehalogenase by (Franken et ai (1991), infra) and for CPY and CPW by (Endrizzi et ai 
( 1 994), infra); and (iv) enzyme inactivation in the case of PPCA. 

Biological Implications. Deficiency of the protective protein/cathepsm A (PPCA) in humans results in the 
lysosomal storage disease galactosialidosis. PPCA is thought to form a multi-enzyme complex with p-galactosidase and 
neuraminidase in the lysosomes protecting the latter glycosidases in their harsh acidic and proteases-rich environment. 
PPCA has a 30% sequence identity to the wheat serine carboxypeptidase (CPW) and yeast serine carboxypeptidase 
(CPY). It has been show that PPCA in the precursor form is inactive, but upon maturation, entailing excision of a 2 kDa 
peptide, carboxypeptidase activity is released. 

The precursor structure reveals an inactivation mechanism that has not been seen before in any of the other 
known zymogen structures of proteases (available for the serine*, metallo- and aspartic protease classes). The catalytic 
triad seems to have an arrangement poised for catalysis. However, the triad is rendered solvent and substrate 
inaccessible by a strand from the maturation subdomain binding in the active site cleft. Surprisingly, this strand called 
the 'blocking" peptide does not overlap with the 2 kDa excision' peptide. Hence, after removal of the excision peptide 
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up to 35 additional residues must rearrange in order to unblock the active site cleft. A strategically positioned pair of 
sal, badges, comprising Arg 262. Arg 298. Glu 264. and Asp 300 at the base of the excision peptide, are expected «o 
optionally become destabilized at low P H. unraveling this region of the structure, allowing easier access to cleavage s.tes 
and/or promoting the rearrangement event. 

5 A number of research groups are currently involved in design.ng enzyme and gene therapy orocedures for 

several lysosomal storage diseases. Insight into the three-dimensional structure, protein functioning and stability of 
PPCA. the first enzyme of known structure associated with a lysosomal storage disease and the third human lysosomal 
structure to be determined, may prove useful in future designs of an adequate therapy procedure for galaoosialidosis 
Information from the three-dimensional structure of PPCA. might also aid in designing an engineered form of PPCA 
IU with increased stability and a longer half-life. 
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Table 1: X-ray Data Collection Statistics 
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resolution 


32.27-2.2 A 


wavelength 


1.08 A 


space group 


P2,2 t 2 


unit cell 


a=l 15.04 b =148.1 1 c=80.97 A 


temperature of data collection 


-178°C 


jno. 01 oujCivcu rci lections 


436 709 


No. of unique reflections 


67,740 


completeness of ail data 


95.7% 


R,^ for all data 


5.1% 


completeness of outer shell 


87.0% 


(2.26-2.20A) 


13.0% 


R,^ in outer shell (2.26-2.20A) 




Rwm-ZDiCh^^th^/EL li(h), where I,(h) is the i* observation for reflection h 
and <l(h)> is the weighted mean of all the observations. 
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Table 2: Course of Model Building 
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nr. of 


nr. of side 




Rf actor 




CC 






Model 


C-s 


chains 


(10 4 A>) 


{statistics using data between 8.0 and 3.oA} 


5 


mol. reDL fmri 
rigid body ref. (mir) 
calculate NCS matrix 


331 


125 


- 


54.2 
52.6 


55.3 
52.9 


0.243 
0.287 


0.244 
0.318 




best monomer (hm\ 
rigid body ref. 
update NCS matrix 


294 


228 


- 


55.9 
53.5 


57.4 
55.0 


0 228 
0.320 


0216 
0328 




bmcl (mask I) 


373 


258 


108 


49.9 


51.3 


0.403 


0.424 


10 


bmc2 (mask 1) 


405 


277 


10.8 


48.6 


48.4 


0.443 


0.478 




bmc3 (mask 2) 
rigid body ref. 
positional ref. (pbmc3) 
update NCS matrix 


411 


307 


9.99 


47.1 
46.9 
39.4 


48.6 
48.4 
44.7 


0.471 
0.476 
0.622 


0.491 
0.492 
0.562 


15 


bmc4 (mask 1) 


412 


327 


10.8 


41.7 


43.1 


0.584 


0.585 




bmcJ (mask 3) 


435 


387 


8.88 


39.8 


40.6 


0.621 


0.623 




bmc6 (mask 4) 


442 


413 


9.11 


38.4 


40.2 


0.647 


0.637 


20 

25 


Summary of the bootstrapping procedure. The resulting models have been listed chronologically starting 
with the molecular replacement solution, i.e. mr (molecular replacement), bm (best monomer core), and 
the bootstrapping cycles bmcl through bmc6. The following statistics are given for the various models: 
the number of C* atoms built per monomer; the number of correct side chains incorporated per monomer 
and the volume of the molecular mask used during the averaging if applicable. The quality of each model 
is assessed using the ^ R,,^, CC and CC fw calculated by X-PLOR for data between 8.0 and 3.0 A. 
After positional refinement of model bmc3. both monomers were made equivalent by taking one monomer 
and generating the non-crystal lographically related one. 



WO 97/15588 



33 



PCTAJS96/17325 



10 



15 



20 



Table 3: Current Status of the Model 



■jtamiics for the data usr d in rcfinffllfflt 



resolution (A) 


Rfactor (%) 


completeness (%) 


0.\) - H.J 


22.4 


85.7 


4.3-3.5 


19.0 


89.1 


3.5-3.0 


20.6 


89.1 


3.0-2.8 


21.3 


87.9 


2.8-2.6 


22.3 


86.1 


2.6 - 2.4 


22.2 


84.0 


2.4-2.3 


22.7 


81.3 


2.3 - 2.2 


24.0 


78.3 


8.0 - 2.2. A 


21.3% 





in*:* 



molecules in the asymmetric unit: 
residues (out of 904 possible): 
sugars: 
waters: 



r.m.s.d. bond length (A): 
r.m.s.d. bond angles (•)• 



average B-values for main chain atoms (A 2 ): 

side chain atoms (A 2 ): 



2 

902 
6 

296 



0.012 
1.72 



16.6 
18.3 
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Table 4 



« 5: VH^nS n£° " *"<*T ^=9796-9812 (1992)) and CPY (Endnzzi 



10 



15 



20 



PPCA 

Catalytic 
Ser 150 

His 429 



Asp 372 



N 
C* 

c 
o 
c 
o* 

N 

c* 

c 
o 
c* 

c 

N*' 

No 

N 

C 

C 

O 

C> 

O*' 



CPW 

Ser 146 

His 397 



Asp 338 



&PPCA-CPW 

N (A) 

O 0.3 

C 0.4 

O 0.3 

C* 0.3 

O t 0.9 

N L5 

C* 0.2 

C 0.3 

O 0.3 

<? 0.5 

0 T 0.3 

C" 0.3 

C. 0.7 

H" 0.4 

No 0.3 

N 0.7 

C* 0.2 

C 0.1 

O OJ 

C* 0J 

C 0.3 

O* 1 0 2 

O" 0.2 
0.4 



CPY 
Ser 146 

His 397 



Asp 338 



N 
C* 

C 

o 
c* 

O t 

N 

C" 

c 
o 
c* 
o* 

C" 

N« J 

N 

C* 

C 
O 

c 



PrgPg«d ft^YflniPn bote < formed hv hvn hackhnnr »mjd . n y 



aPPCA-CPW 

(A) 

0.4 

0.5 

04 

0.4 

11 

0.9 

0.4 

0.4 

0.5 

0.6 

0.6 

0.5 

0.5 

0.5 

0.4 

0.5 

0.2 

0.1 

0.1 

0.1 

0.2 

0.1 

0.3 

0.1 



Gly 57 
Tyr 151 



N 
C* 

C 

o 

N 

c 

c 
o 



Gly 53 
Tyr 147 



N 
C* 

c 
o 

N 

c* 

c 
o 



0.1 
0.2 
0 1 

0.3 
0.3 
0.2 
0.3 
0.5 



Gly 53 
Tyr 147 



Proposed regulation nf pfl ^pendent ngnn dase acm/jr y 



Asn55 
Glu 69 
Glu 149 



averaged over all atoms Asn 5 1 
averaged over all atoms Glu 65 
averaged over all atoms Glu 145 



0.2 
0.3 
0.4 



N 
C* 

c 
o 

N 

c- 

c 
o 



Asn 51 
Glu 65 
Glu 145 



0.5 
04 
0.4 
0.8 

0.2 
0.1 

0.2 

0.2 



0.2 
0.7 
0.4 



The residues forming the proposed catalytic machinery are strictly conserved berw~n rw. 
«-f w or CPY after superposition is given in Angstrom ! 
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What Is Claimed Is: 

l. A method for crystallizing a human protective protein/cathepsin A (PPCA) or 
precursor human protective/cathepsin A protein (pPPCA). comprising 

(a) providing a purified PPCA or pPPCA; 
5 (b) crystallizing the purified PPCA or pPPCA using a hanging drop or diffusion 

method, to provide crystallized PPCA or pPPCA having biological activity, 

wherein the crystallized PPCA or pPPCA is resolvable using x-ray crystallography to obtain 
. x-ray diffraction patterns suitable for three-dimensional structure determination of the PPCA or 
pPPCA. 

10 2. A method according to claim 1, wherein said PPCA or pPPCA has at least one 

biological activity selected from the group consisting of enzyme protecting activity, enzyme 
modulating activity and peptide hydrolyzing activity. 

3. A method according to claim 1, wherein said crystallization step is done under 
conditions of purified PPCA or pPPCA; 2-30% PEG400- 10,000; precipitating salt: buffers, and pH 

15 7-9. 

4. A method according to claim 3, wherein the crystallization conditions are PPCA or 
pPPCA; 5-14% PEG8000, 40-80 mM tromethamine, 0.05-2.0 mM NaN 3 and pH 8.0-8.3. 

5. A crystallized PPCA or pPPCA, or at least one subdomain thereof, provided by a 
method according to claim 1 . 

20 6 - A method for providing an atomic model of a PPCA or pPPCA, comprising 

(a) providing a computer readable medium having stored thereon atomic 
coordinate/x-ray diffraction data of said PPCA or pPPCA in crystalline form, said data sufficient to 
model the three-dimensional structure of said PPCA, said pPPCA, or at least one subdomain thereof; 

(b) analyzing, on a computer using at least one subroutine executed in said computer, 
25 the atomic coordinate/x-ray diffraction data from (a) to provide data output defining an atomic model 

of said PPCA or said pPPCA. said analyzing utilizing at least one computing algorithm selected from 
the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity 
merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, electron map 
30 visualization, model building, rigid body refinement, positional refinement; and 

(c) obtaining atomic model output data defining the three-dimensional structure of 
said PPCA, pPPCA or at least one subdomain thereof. 

7. A method according to claim 6, wherein said computer readable medium further has 
stored thereon data corresponding to a nucleic acid sequence or an amino acid sequence data 
35 comprising at least one structural domain or a functional domain of a PPCA or pPPCA 
corresponding to a portion of the amino acid sequences of Figures 13 or 14 ; and wherein said 
analyzing step further comprises analyzing said sequence data. 
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8. A computer readable medium having stored thereon atomic model data of said PPCA 
or pPPCA as the model output data produced by a method according to claim 6. 

9. A computer-based system for providing atomic model data of the three dimensional 
structure of a PPCA or a pPPCA. comprising the following elements; 

5 difffir ,. 2 COmPUtCr readab,e mediUm Sl ° red thereon a,omic coordinate/x-ray 

diffraction data of said PPCA or pPPCA or at least one subdomain thereof; 

(b) at least one computing subroutine, that when executed in a computer causes the 
com puter to analyze the atomic coordinate/x-ray diffraction data from (a) to provide data outpu 
definmg a, atomic del of ^ ppCA Qr pppCA> ^ ^ ^ ^ J 

10 subrouune selected from the group consisting of data processing and reduction, auto-indexing 
«men Slly scahng> intensity mergmg ^ . 

molecular alignment, molecular refinement, electron density map ca.culat.on, electron dens.ty 

r«; n and deCtr ° n ^ ViSUali2ati ° n ' m ° del ri * id ^ -«nement, positional 

15 (C> retriCVal meanS for obtainin 8 atomi ' '"odel output data defining the three- 

d.mens,onal structure of said PPCA, pPPCA or at least one subdomain thereof. 

10. A computer-based system according to claim 9, wherein said computer readable 
med,um further has stored thereon data corresponding to a nucleic acid sequence or an amino acid 

20 npTrT C T PriSing ^ ,CaSt StmCtUral d ° main ° r 3 fUnCti ° nal domain of a PPCA or 
PPPCA corresponding to aportion of the amino acid sequences of Figures 13 or 14, and wherein said 

at least one subroutine further includes analyzing said sequence data. 

11. A computer readable medium, having stored thereon atomic model data of a PPCA 
PPPCA, or at least one subdomain thereof, produced by a computer system according to claim 9 ' 

12. A method for providing an computer atomic model of a Iigand of a PPCA or pPPCA 
25 comprising y ^' 

(a) providing a computer readable medium according to claim 1 1 , having stored 
thereon atomic model data of a PPCA, a pPPCA or at least one subdomain thereof; 

(b) providing a computer readable medium having stored thereon atomic model data 
sufficient to generate atomic models of potential ligands of PPCA or pPPCA; 

3 ° (c) analyzing on a computer, using at least one subroutine executed in said computer 

the ^atomic model data from (a) and the ligand data from (b). to determine binding sites ofPPCA or 
PPPCA and to provide data output defining an atomic model of a ligand of said PPCA pPPCA or 
at least one subdomain thereof, said analyzing utiHzing computing subroutines selected from the 
group consisting of data processing and reduction, au.o-.ndexing. intensity scaling, intensity 
merging, amplitude conversion, tnmcation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, elecron man 
visualization, model building, rigid body refinement, positional refinement- and 
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(d) obtaining atomic model output data defining the three-dimensional structure of 
a ligand of said PPCA, pPPCA or at least one subdomain thereof. 

13. A computer readable medium having stored thereon the model output data produced 
by a method according to claim 12. 
5 14. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 

atomic model of the ligand model produced by a method according to claim 12. 

1 5. A computer-based system for providing an atomic model of a ligand of a PPCA or 
pPPCA, comprising the following elements; 

(a) a computer readable medium having stored thereon atomic model data of a PPCA 

10 orpPPCA; 

(b) a computer readable medium having stored thereon atomic model data sufficient 
to generate atomic models of potential ligands of PPCA or pPPCA; 

(c) at least one computing subroutine for analyzing on a computer the atomic model 
data of PPCA or pPPCA from (a) and the ligand data from (b), to determine binding sites of PPCA 

15 or pPPCA and to provide data output defining a atomic models of potential ligands of PPCA or 
pPPCA, said analyzing utilizing at least one computing subroutine selected from the group consisting 
of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude 
conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron 
density map calculation, electron density modification, electron map visualization, model building, 

20 rigid body refinement, positional refinement; and 

(d) retrieval means for obtaining atomic model output data defining the atomic 
models of potential ligands of PPCA or pPPCA. 

16. A computer readable medium, comprising atomic model output data of a potential 
ligand of PPCA or pPPCA, said data produced by a method according to claim 15. 

25 1 7. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 

atomic model of a ligand produced by a computer system according to claim 15. 

18. A crystallized pPPCA, having the atomic coordinates presented in Figure 23. 1 -23.41 
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Claims Nos.: 
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Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 


This International Searching Authority found multiple inventions in this international application, as follows: 






saae See Extra Sheet. 
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As only some of the required additional search fees were timely paid by the applicant, this international search report covers 
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| | No protest Accompanied the payment of additional search fees. 
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BOX II. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as follows: 

This application contain* the following inventions or groups of inventions which are not so linked as to form a single 
inventive concept under PCT Rule 13.1. In order for all inventions to be searched, the appropriate additional search 
fees must be paid. 

Group I t claims 1-4, sharing the inventive concept of methods of crystallizing PRC A or pPPCA. 
Group II ( claims 5 and 18, sharing the inventive concept of PPCA protein. 

Group III, claims 6-11, sharing the inventive concept of a method of providing an atomic model of PPCA. 

Group IV, claims 12-13 and 15-16, sharing the inventive concept of a method of providing an atomic model of a ligand 
for PPCA. 

Group V, claims 14 and 17, sharing the inventive concept of a ligand for PPCA. 

The inventions listed as Groups I-V do not relate to a single inventive concept under PCT Rule 13.1 because, under 
PCT Rule 13.2, they lack the same or corresponding special technical features for the following reasons: The claims 
are not so linked by a special technical feature within the meaning of PCT Rule 13.2 so as to form a single inventive 
concept. The 'special technical features" means those technical features that define a contribution over the prior art. 
(See PCT Rule 13.2.) Because PPCA was known in the prior art (see description at page 1, lines 16-18), it cannot 
form the basis of unity of invention. Therefore, the main invention which forms a single inventive concept is Group I, 
claims 1-4, which is a method of crystallizing. Group II has the inventive concept of a PPCA protein. Group HI has 
the inventive concept of a method of providing an atomic model of PPCA, Group IV had the inventive concept of 
providing an atomic model of a ligand for PPCA and Group V has the inventive concept of a ligand for PPCA; none of 
these Groups share the special technical of Group I therefore, unity of invention is lacking. 
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